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PREFACE 


The  purpose  of  IDA  Paper  P-2132,  SDS  Software  Testing  and  Evaluation:  A  Review  of  the  State-of- 
the-Art  in  Software  Testing  and  Evaluation  With  Recommended  R&D  Tasks,  is  to  identify  the  technology 
required  for  effective  and  efficient  testing  and  evaluation  of  Strategic  Defense  System  (SDS)  software. 
This  document  was  prepared  for  the  Strategic  Defense  Initiative  Organization  (SDIO),  and  provides  an 
overview  of  current  testing  and  evaluation  technology,  a  mapping  of  available  technology  gainst  SDS 
needs,  and  recommendations  to  close  critical  gaps  in  technology. 

IDA  Memorandum  M-496  is  a  related  document  which  provides  a  comprehensive,  annotated  bibliog¬ 
raphy  of  the  reference  material  acquired  in  the  course  of  this  work.  IDA  Memorandum  M-513,  another 
related  document,  is  a  collection  of  the  papers  provided  by  leading  experts  in  testing  and  evaluation  tech¬ 
nology  preparatory  to  an  IDA  workshop  held  in  support  of  this  work. 

The  authors  gratefully  acknowledge  the  support  given  by  all  those  who  participated  in  the  workshop 
and  reviewed  this  paper.  The  insights  and  experience  these  people  so  willingly  shared  has  been  invalu¬ 
able.  In  particular,  special  thanks  go  to  the  following  people: 
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EXECUTIVE  SUMMARY 


Introducdon 

Testily  and  w«alaation  are  recognized  as  pivotal  problems  in  the  development  of  the  Strategic  Defense 
System  (SDS).  Consequently,  a  development  approach  combining  design,  prototyping,  and  simulation  is 
being  us^  to  allow  early  evduadon  of  system  requirements  and  designs.  Loosely-coupled,  decentralized 
system  architectures  are  being  examined  for  their  facility  to  reduce  the  testing  problem  to  a  mam^eable 
level.  But  what  testing  and  evaluation  technology  shall  be  used  to  ensure  the  reliability  of,  and  provide 
the  necessary  level  of  confidence  in,  SDS  software?  The  purpose  of  this  report  is  to  address  this  ques¬ 
tion. 

For  the  past  decade,  software  testing  and  evaluation  have  been  relatively  unpopular  subjects.  Why  this 
happened  is  less  important  than  its  consequences.  It  has  been  a  contributing  factor  to  the  lag  between  the 
state-of-the-art  and  state-of-the-practice,  which  is  the  largest  of  any  in  the  diverse  areas  of  software 
engineering,  and  has  resulted  in  a  shortage  of  research  and  development  (R&D)  resources  and  a  small, 
weak  research  community.  Since  SDS  prototype  software  is  already  being  developed,  software  testing 
and  evaluation  are  now  critical  concerns.  The  Strat^c  Defense  Initiative  Organization  (SDIO)  cannot 
wait  for  testing  and  evaluation  issues  to  pop  up  later  in  the  development  process  (a  traditional  approach 
which  has  proven  costly  in  the  past),  but  must  meet  the  challenge  up  front. 

Once  the  importance  of  this  challenge  is  recognized,  appropriate  actions  can  be  taken.  There  is  a  sub¬ 
stantial  body  of  testing  and  evaluation  technology  whose  use  in  an  industrial  environment  is  ready  to  be 
investigated,  preparatory  to  transitioning  the  technology  into  practice.  The  main  body  of  this  report 
describes  the  current  status  of  different  areas  of  this  technology.  It  also  outlines  a  number  of  tasks  for 
bringing  promising  elements  of  the  technology  into  practice.  WhUe  the  use  of  advanced  technology  will 
certainly  help,  it  is  by  no  means  sufficient  to  resolve  all  testing  and  evaluation  problems.  In  particular, 
technology  for  the  testing  and  evaluation  of  large,  distributed  and  real-time  software  systems  is  still  in  its 
infancy  The  final  part  of  this  report,  therefore,  recommends  a  number  of  tasks  to  extend  the  boundaries 
of  technology  to  meet  SDS  needs. 

Testing  and  evaluation  technology  is  not,  by  itself,  enough  to  ensure  improved  practices.  Policy  must 
evolve  in  step  with  the  technology  to  ensure  its  proper  application.  The  process  of  setting  and  revising 
policy  is  invariably  lengthy  and,  therefore,  poUcy  inevitably  lags  behind  the  state-of-the-art.  Conse¬ 
quently,  it  is  important  that  SDIO  testing  and  evaluation  policies  be  designed  to  encourage  carefully  con¬ 
sidered  iimovation,  rather  than  dictate  the  use  of  particular  techniques.  The  disciplined  development 
approaches  needed  to  produce  reliable  software  and  facilitate  testing  and  evaluation  are  another  policy, 
issue,  and  one  already  under  consideration  in  the  evolving  SDIO  Software  Policy  (see  Section  2.4.3). 
Indeed,  it  is  important  to  emphasize  that  reliability  cannot  be  tested  or  evaluated  into  software.  Reliable 
SDS  software  can  only  be  achieved  throu^  improvements  in  the  “upstream”  stages  of  the  development 
life  cycle.  Testing  and  evaluation  during  these  activities  will  provide  the  necessary  feedback  to  ensure 
that  these  activities  are  performed  properly. 

Althoi^  this  report  specifically  addresses  testing  and  evaluation  for  SDS  software,  advances  in  test¬ 
ing  and  evaluation  practices  achieved  for  SDS  could  be  exploited  to  benefit  all  DOD  software  efforts. 

SDS  Software  Testing  and  Evaluation  Needs 

The  SDS  will  possess  many  characteristics  which  stress  current  testing  and  evaluation  technology. 
This  fact,  combined  with  the  inability  to  conduct  full-scale  operational  testing  in  the  usual  manner, 
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require  that  software  testing  and  evaluation  be  assigned  a  central  role  of  the  development  of  the  system 
as  a  whole. 

At  a  high  level  of  detail,  critical  needs  for  SDS  software  testing  and  evaluation  revolve  around  the  fol¬ 
lowing  issues: 

•  Planning  for  software  testing  and  evaluation  must  start  with  the  user’s  definition 
of  operational  system  requirements,  and  system  requirements  must  be  reviewed 
against  the  ability  to  conduct  needed  testing  and  evaluation. 

•  Testing  and  evaluation  must  be  thoroughly  integrated  into  a  development  life 
cycle  which  includes  prototyping,  simulation,  and  the  use  of  formal 
specifications. 

•  The  introduction  and  use  of  testing  and  evaluation  technology  must  be  carefully 
planned  to  reflect  programmatic  concerns,  including  the  need  for  well-defined 
organizational  support,  roles,  and  policies. 

The  key  mechanism.*  proposed  for  meeting  these  needs  are: 

•  An  overall  test  plan  concept  which  institutionalizes  an  SDS  testing  and  evalua¬ 
tion  process  model.  The  model,  itself,  should  provide  an  evolutionary  frame¬ 
work  showing  (1)  how  testing  and  evaluation  fit  into  development  activities,  and 
(2)  what  specific  technology  elements  should  be  exploited.  Moreover,  the  model 
allows  the  flexibility  necessary  to  allow  the  continuing  adoption  of  improved 
practices  and  tools. 

•  Explicit  software  testing  requirements.  These  requirements  should  be  initially 
derived  from  system  requirements  and  refined  during  the  progression  to  code  to 
guide  the  application  of  testing  and  evaluation  technology. 

Dynamic  Analysis  Technology 

There  is  an  evolving  body  of  technology  for  the  dynamic  analysis  of  sequential  programs.  There  is  no 
doubt  that  disciplined  application  of  available  state-of-the-art  techniques  offers  substantial  improve¬ 
ments  over  current  testing  practices.  Although  these  techniques  may  be  expensive  to  apply,  their  cost 
largely  accrues  from  execution  costs;  with  appropriate  automation,  they  do  not  place  excessive  burdens 
on  the  skill  or  labor  required  from  software  developers.  Major  short-term  deficiencies  in  dynamic 
analysis  of  sequential  programs  arise  from  an  absence  of  quantitative  information  on  the  error  and  fault 
detection  capabilities  and  costs  of  existing  techniques,  and  the  slow  growth  of  understanding  on  how  to 
integrate  the  application  of  several  techniques.  One  promising  area  deserving  of  greater  attention  is  the 
use  of  formal  specifications  to  facilitate  functional  analysis,  allow  greater  automation  of  the  testing  pro¬ 
cess,  and  resolve  problems  due  to  the  lack  of  effective  oracles. 

Technology  for  dynamic  analysis  of  concurrent  and  real-time  software  is  less  mature.  While  testing  of 
sequential  programs  evolved  from  graph-theory  based  modeling  of  control  flow  properties,  there  is  no 
comparably  stable  basis  for  analysis,  debugging,  or  run-time  monitoring  of  concurrent  and  real-time 
software.  Even  so,  some  emerging  techniques  seem  very  promising.  The  resources  required  to  develop 
necessary  automated  support  and  conduct  trials  of  these  techniques  on  realistic  software  efforts  should 
be  provided.  The  slow  progress  in  this  area,  however,  requires  consideration  of  alternative  approaches. 
For  example,  the  use  of  self-testing  software  for  critical  SDS  components  must  be  investigated. 

Dynamic  techniques  are  rarely  applied  to  precode  products.  The  applicability  of  existing  techniques  to 
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executable,  pre-implementation  representations  such  as  the  Strategic  Defense  Initiative  (SDI)  Architec¬ 
ture  Dataflow  Modeling  Technique  (SADMT)  (see  Section  2.2.1)  should  be  examined. 

Static  Analysis  Technology 

Again,  there  are  many  static  techniques  for  the  analysis  of  sequential  code.  Since  these  are  largely 
automated,  requirii^  minimal  human  effort  to  apply,  a  base  set  of  static  analyses  should  be  routinely 
required  for  all  SDS  software.  There  are  relatively  few  techniques  for  the  analysis  of  concurrent  and 
real-time  software.  As  with  dynamic  approaches,  development  of  the  most  promising  techniques  should 
be  fostered  by  providing  the  resources  needed  to  develop  necessary  tools  and  to  allow  evaluation  of  these 
techniques  in  realistic  software  development  environments. 

Unlike  dynamic  approaches,  there  are  a  few  techniques  for  static  analysis  of  pre-implementation  pro¬ 
ducts.  The  use  of  a  common  set  of  formal  representation  forms  for  early  SDS  development  products 
which  facilitate  the  use  of  existii^  static  analysis  techniques  and  provide  a  common  framework  for  addi¬ 
tional  techniques  must  be  examined.  In  particular,  approaches  such  as  fault  tree  analysis  (which 
identifies  combinations  of  conditions  which  may  lead  to  critical  system  or  software  failures)  will  be 
extremely  important  in  the  SDS  software  testing  and  evaluation  planning.  For  example,  they  will  help  to 
identify  critical  SDS  components  where  additional  testing  dollars,  or  special  fault-tolerance  approaches, 
are  required. 

Automated  Support  for  Dynamic  and  Static  Analysis 

From  the  SDS  perspective  there  are  two  promising  trends  here.  First,  recent  tool  developments  are 
focusing  on  supporting  testing  and  evaluation  of  Ada  programs.  In  particular,  the  Ada  language  is  clearly 
becomii^  the  target  of  choice  for  tools  applying  advanced  techniques.  Second,  a  few  organizations  are 
undertaking  the  development  of  comprehensive  testii^  and  evaluation  environments  which  will  provide  a 
broad  range  of  capabilities.  Althou^  these  efforts  are  tackling  difficult  problems,  researchers  expect 
sophisticated  prototype  environments  to  become  available  within  two  years  (see  Section  4.6). 

It  must  be  clearly  understood,  however,  that  existing  tools  are  more  or  less  exclusively  prototypes. 
There  are  many  techniques  which  are  sufficiently  mature  to  justify  production  qualify  automated  sup¬ 
port,  but  the  lack  of  research  resources  has  prevented  development  of  tools  which  are  suitable  for 
widespread  use.  Available  tools  generally  lack  robustness,  complete  documentation,  speed,  and  the  abil¬ 
ity  to  handle  very  large  software  systems.  The  need  for  productization  of  existing  prototype  tools  is 
urgent  and  itself  requires  research  to  develop  increased  understanding  of  the  issues  involved.  One  of  the 
pertinent  issues  is  flexibility.  Large  scale  tool  integration  efforts  will  only  remain  viable  and  useful  if  they 
can  continue  to  integrate  the  increasing  numbers  and  varieties  of  tools  that  will  emerge  in  the  coming 
years. 

Formal  Verification  Technology 

The  foundations  of  formal  verification  were  laid  in  the  1960’s  and  followed  by  prototype  verification 
system  development  in  the  1970’s.  In  the  early  1980’s,  the  complexify  of  formal  proofs  and  practical  limi¬ 
tations  on  the  size  of  systems  that  could  be  verified  at  the  program  code-level  became  apparent.  Incre¬ 
mental  improvements  in  verification  tools  and  environments  are  being  made  but  are  not  likely  to  make 
code-level  proofs  of  large  systems  feasible  in  the  near  term.  On  the  other  hand,  the  use  of  formalism  in 
software  requirements,  programming  languages,  and  test  specifications  (prompted  by  the  verification 
community)  has  increased  assurance  of  correct  operation  of  large  systems  significantly.  Proofs  of  high- 
level  designs  are  feasible,  even  though  code-level  proofs  remain  attainable  only  for  smaller,  critical  com¬ 
ponents  and  subsystems. 
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Measurement  Technology 

The  field  of  software  measurement  has  traditionally  been  concerned  with  the  application  of  stand¬ 
alone  metrics  to  software  products  and  processes.  Unfortunately,  this  approach  h^  failed  to  produce 
empirical  results  which  are  of  major  use  to  a  software  developer  or  manager.  Problems  with  the  current 
technology  include  an  inability  to  validate  metrics  or  compare  metric  results  across  multiple  projects  and 
organizations.  Metrics  are  often  abused  by  inappropriately  viewing  them  as  the  goal  of  &e  software 
measurement  effort  themselves,  rather  than  low-level  indicators  of  product  and  process  qualities  which 
only  make  sense  in  the  context  of  some  measurement  goal.  However,  when  used  with  discretion,  metrics 
can  provide  insights  into  desirable  and  undesirable  software  characteristics. 

There  are  several  aspects  of  software  measurement  which  must  be  substantially  improved  before 
major  benefits  can  be  achieved.  A  well-established  measurement  methodology  which  helps  in  selecting 
appropriate  project  metrics,  collecting  and  validating  the  data  obtained,  and  analyzing  and  interpreting 
both  the  data  and  the  metrics  must  be  developed.  Methods  for  deriving  metrics  for  specific  application 
domains  are  also  needed.  Measurement  processes  must  be  integrated  into  software  development  activi¬ 
ties  in  order  to  ensure  early  data  collection,  feedback  to  the  development  processes,  and  reuse  of  collec¬ 
tion  methods.  Tools  must  be  developed  to  automate  the  activities  of  the  measurement  process  to  the  ful¬ 
lest  extent  possible. 

Reliability  Assessment  Technoiogy 

The  thrust  of  current  software  reliability  assessment  technology  is  prediction  of  a  software  product’s 
future  failure  behavior  from  its  past  failure  behavior.  This  prediction  effectively  supports  management 
activities  such  as  estimating  project  schedules,  optimizing  the  allocation  of  project  resources,  and  optim¬ 
izing  the  timing  of  new  software  releases.  However,  it  does  not  adequately  support  the  feedback  of  relia¬ 
bility  information  into  the  construction  of  highly  reliable  software  and  systems,  which  is  of  paramount 
importance  in  the  case  of  the  SDS. 

Software  reliability  assessment  technology  needs  to  evolve  in  a  number  of  directions.  First,  to  sup¬ 
port  the  construction  of  reliable  software,  emphasis  must  shift  toward  the  software  development  pro¬ 
cess.  That  is,  the  targets  of  software  reliability  assessment  should  be  software  development  methodolo¬ 
gies,  practices,  tools,  techniques,  and  other  elements  of  the  software  development  process,  rather  than 
individual  software  products.  The  motivation  here  is  that  the  best  way  to  construct  reliable  software  is  to 
utilize  software  development  methodologies  that  have  been  shown  to  afford  the  highest  degree  of  reliabil¬ 
ity. 


Second,  the  technology  must  enable  software  reliability  to  be  assessed  in  a  system  context.  In  distri¬ 
buted  real-time  systems,  the  software  is  responsible  for  dealing  with  timing  constraints,  hardware 
failures,  and  software  faults.  Accordingly,  software  correctness  and  reliability  depend  on  whether  the 
software  meets  its  requirements  mth  respect  to  real-time  and  fault  tolerance. 

Third,  since  software  reliability  must  be  taken  into  account  when  assessing  system  reliability,  the  tech¬ 
nology  must  support  system  reliability  assessment.  In  regard  to  this  issue,  the  traditional  practice  of  cast¬ 
ing  software  reliability  in  hardware  reliability  terms  and  then  using  combinatorial  analysis  to  derive  sys¬ 
tem  reliability  needs  to  be  rethought.  In  particular,  the  distinction  between  design  faults  and  ^e-related 
faults  needs  to  be  given  further  consideration. 

Recommended  Tasks  to  Exploit  and  Extend  Technology 

Tasks  to  begin  the  process  of  making  a  sophisticated  body  of  testing  and  evaluation  technology 
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available  for  SDS  software  efforts  fall  into  two  groups.  The  first  set  of  tasks  provide  the  foundation  for 
bringing  state-of-the-art  technology  into  play,  while  learning  more  about  its  effectiveness.  The  technology 
that  can  be  immediately  transitioned  into  practice  lies  largely  in  the  areas  of  testii^  and  evaluation  of 
sequential  code  products.  The  problem  with  less  develop^  technologies  lies  in  identifying  those 
deficiencies  where  intensive  research  has  a  strong  probabilify  of  producing  practically  usefiil  techniques 
and  tools  in  time  for  full  scale  development  of  SDS  software.  A  number  of  tasks  are  needed  to  investi¬ 
gate  emer^mg  technology  and  to  identify  those  areas  where  fundamental  research  should  be  sponsored. 
These  two  groups  of  recommended  tasks  are  outlined  in  figures  E-1  and  E-2. 

One  of  the  important  findings  of  this  report  is  that  the  research  community  has  not  grown  sufficiently 
in  the  past  decade  or  so  and  is  not  large  enoi^  or  strong  enough  to  meet  the  challenges  raised  by  SDS 
software  testing  and  evaluation.  Steps  to  strengthen  and  expand  the  software  testing  and  evaluation 
research  community  must  be  promptly  taken. 

In  addition,  it  must  be  recognized  that  technology  transfer  is,  at  least,  as  big  a  problem  as  technology 
development.  A  well-supported  effort  devoted  to  technology  transition  for  SDS  purposes  should  be  put 
in  place  in  the  immediate  future. 
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•  Implement  the  test  plan  and  testing  requirements  concepts  to  provide  the  mechanisms  for: 

—  Better  int^ration  of  testii^  and  evaluation  into  development  activities. 

—  Providing  increased  visibility,  control,  and  repeatability  of  testing  and  evaluation  activities. 

•  Establish  a  SDS  Software  Data  Collection  System  which: 

—  Supports  analysis  of  both  SDS  software  and  the  technology  used  to  develop,  test,  and  support  that 
software. 

—  Provides  a  historical  database  capability  and  acts  as  a  focal  point  for  research. 

•  Develop  a  comprehensive  testing  and  evaluation  environment  by: 

—  Exploiting  promising  ongoing  environment  development  efforts. 

—  Meanwhile,  assembling  an  interim  “environment”  from  available-tools. 

•  Embark  on  a  program  of  process  modeling  to  explore  effective,  flexible  ways  of  integrating  testing 
and  evaluation  into  software  development  activities  in  such  a  way  as  to  enable  SDIO  to  keep  up  with 
emerging  technology. 


Figure  E-1.  Tasks  to  Exploit  Technology 


•  Conduct  a  series  of  technology  demonstrations  applying  evolving  technology  to  specific  SDS  prob¬ 
lems.  Example  problems  are: 

—  Identify  critical  SDS  properties  to  be  formalized  and  verified. 

—  Develop  methods  for  reasoning  about  “degraded”  systems. 

—  Develop  testing  and  evaluation  process  specifications  in  software  contexts. 

•  Sponsor  a  number  of  R&D  tasks  directed  at  specific  gaps  in  technology  (see  Section  9).  For  example: 

—  Promote  the  use  of  increased  formalism  for  early  life  cycle  products. 

—  Develop  a  methodology,  tools,  and  policy  to  support  planning  and  execution  of  regression  testing. 

—  Identify  a  minimum  set  of  preconditions  and  postconditions  which  can  be  required  in  the 
specification  of  all  FSD  SDS  software. 

—  Develop  a  comprehensive  measurement  methodology. 

•  Monitor  ongoing  research  efforts  so  that: 

—  Promising  developments  are  promptly  considered  for  SDS  practice. 

—  The  SDIO  supports  efforts  which  indicate  solutions  to  specific  SDS  problems. 


Figure  E«2.  Tasks  to  Extend  Technology 
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1.  INTRODUCTION 

Software  testing  and  evaluation  is  widely  recognized  as  one  of  the  primary  challenges  in  the  develop¬ 
ment  of  the  Strategic  Defense  System  (SDS).  This  challenge  arises  from  several  factors.  Hrst  and 
foremost,  no  system  of  comparable  size  has  ever  before  been  developed.  Second,  the  system  must  be 
reconfigurable  to  adapt  to  rapidly  changing  requirements,  some  of  which  will  arise  dynamically  due  to 
countermeasures  by  adversaries.  Third,  it  must  perform  massively  parallel  computations  distributed  on  a 
network  of  complex  system  components.  Fourth,  there  will  be  hard  real-time  deadlines  that  must  be  met 
by  the  SDS  to  achieve  its  mission  and  to  ensure  the  safety  of  the  system  and  environment.  Bfth,  the  sys¬ 
tem  must  be  fault  tolerant  not  only  to  continue  operation  in  the  face  of  inherent  software  and  hardware 
failures,  but  to  survive  in  a  hostile  environment.  The  practical  and  political  limitations  on  full-scale  test¬ 
ing  in  an  operational  environment  further  exacerbate  the  testing  problem. 

It  must  be  accepted  that  SDS  software  testing  and  evaluation  needs  cross  the  boundaries  of  current 
technology.  If  the  traditional  approach  of  postponing  investigation  of  testing  problems  until  testing  activi¬ 
ties  are  due  to  commence  is  followed,  then  the  predictions  of  failure  made  by  Pamas  [Pam85]  and  others 
[Lin8S,Bump87]  may  be  fulfilled.  Moreover,  testing  and  evaluation  cannot  be  addressed  independently  of 
software  development.  Software  must  be  developed  with  error  prevention  in  mind  and  designed  to  facili¬ 
tate  testing  and  evaluation.  Since  SDS  prototypes  are  already  being  developed,  the  software  testing  and 
evaluation  challenge  must  be  squarely  faced  and  immediate  actions  taken  to  investigate  solutions. 

This  report  is  the  first  step  in  an  effort  directed  at  identifying  the  technology  that  is  needed  for  SDS 
software  testing  and  evaluation.  The  SDS  development  approach  and  special  software  testing  and  evalua¬ 
tion  concerns  are  discussed  in  Section  2.  These  are  used  to  develop  a  conceptual  model  of  the  SDS 
software  testing  and  evaluation  process  which  provides  a  framework  for  inducing  the  necessary  syner¬ 
gism  between  development  and  testing  activities.  Sections  3  through  7  review  the  state-of-the-art  in 
software  testing  and  evaluation  technology.  Section  8  then  maps  current  technology  against  needs  to 
determine  what  must  be  done  to  exploit  the  best  of  available  technology.  Although  transitioning  the 
state-of-the-art  testing  and  evaluation  technology  into  practice  can  be  expected  to  yield  substantial 
improvements  in  software  reliability,  it  by  no  means  ensures  a  sufficient  technology  for  SDS  purposes. 
Section  9  recommends  a  number  of  R&D  tasks  to  begin  the  process  of  resolving  these  deficiencies. 

The  remainder  of  this  section  sets  the  scene  for  the  following  discussions  by  outlining  the  role  of  test¬ 
ing  and  evaluation  activities  in  the  software  development  life  cycle.  The  current  state-of-the-practice  is 
reviewed  to  provide  a  baseline  against  which  possible  improvements  can  be  assessed.  Finally,  the  role  of 
the  Institute  for  Defense  Analyses  (IDA)  Testing  and  Evaluation  Workshop  held  in  support  of  this  work 
is  briefly  described.  First,  some  definitions  of  primary  terms  are  appropriate. 

1.1  Definitions  of  Terms 

In  this  report,  the  term  testing  and  evaluation  is  used  in  the  general  sense  to  refer  to  the  planning,  con¬ 
ducting,  and  reporting  of  all  activities  involved  in  software  validation  and  verification.  In  this  context, 
validation  and  verification  are  not  distinguished  as  two  separate  activities  but  used  jointly  to  refer  to  the 
process  of  reviewing,  inspecting,  testing,  checking,  auditing,  or  otherwise  establishing  and  documenting 
whether  or  not  items,  processes,  services,  or  documents  conform  to  specified  requirements.  For  the  pur¬ 
poses  of  this  report,  the  terms  dynamic  analysis,  static  analysis,  formal  verification,  and  measurement 
will  be  used  to  refer  to  the  different  types  of  validation  and  verification  activities. 

Dynamic  analysis  approaches  rely  on  executing  a  piece  of  software  with  selected  test  data  to  detect,  or 
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in  some  cases  demonstrate  the  absence  of,  software  faults.  Static  analysis  approaches  have  the  same 
goals,  but  do  not  base  the  recognition  of  f^ts  on  expected  software  outputs.  Formal  verification  tech¬ 
niques,  the  most  rigorous  analysis  approaches,  apply  formal,  mathematical  principles  to  prove  the 
correctness  of  software  designs  and  program  code  with  respect  to  a  formal  specification  of  the  behavior 
in  question.  Measurement  techniques  are  concerned  with  the  quantitative  evaluation  of  critical  properties 
of  both  software  products  and  the  processes  used  to  develop  and  support  software.  Techniques  for 
assessing  software  reliability  can  be  isolated  as  a  subset  of  measurement  techniques  which  base  evalua¬ 
tion  of  the  property  in  question  on  the  occurrence  of  software  failures  experienced  during  testing. 

An  error  is  a  mental  mistake  made  by  a  software  developer.  Its  manifestation  may  be  a  textual  problem 
within  the  software  called  a  fault.  A  failure  occurs  when  an  encountered  fault  prevents  the  software  from 
performing  a  required  function  within  specified  limits. 

Debugging  is  an  activity  related  to  testing  and  evaluation.  While  testing  and  evaluation  activities  are 
designed  to  detect  faults,  debugging  is  concerned  with  textually  isolating  these  faults  and  eliminating  the 
underlying  error. 

Definitions  of  additional  terms  are  ^ven  in  the  accompanying  glossary. 

1.2  Ovendew  of  the  DOD  Testing  and  Evaluation  Process 

As  software  development  proceeds,  successive  products  are  reviewed  against  the  requirements 
specified  by  their  predecessors.  Typically,  these  reviews  focus  on  the  consistency  and  completeness  of 
the  new  products,  and  little  rigorous  testing  and  evaluation  is  performed  on  precode  software  products. 
Testing  and  evaluation  of  code  products  occurs  in  three  stages:  unit,  integration,  and  system  testing.  As 
code  is  developed,  each  unit  is  tested  individually.  In  a  programming  language  such  as  Ada  [MIL83], 
these  units  may  be  subprograms,  packages,  tasks,  or  generic  units.  Units  are  then  incrementally  com¬ 
bined  to  test  the  interfaces  between  them.  In  later  stages,  this  integration  testing  combines  hardware  and 
software  elements,  proceeding  until  the  entire  system  is  integrated.  Integration  testing  may  follow  a  top- 
down  or  bottom-up  strategy.  In  top-down  testing,  the  most  abstract  software  units  are  combined  first  and 
stubs  are  used  to  represent  the  more  detailed  units  they  invoke.  Bottom-up  testing  starts  with  examining 
the  interfaces  among  the  most  detailed  units,  which  are  aecuted  by  a  test  driver  that  invokes  the  units  in 
the  proper  manner  and  provides  the  necessary  input  data  to  each.  There  are  variations  on  these  basic 
strategies;  for  example,  outside-in  testing  allows  the  units  most  directly  concerned  with  handling  inputs 
and  outputs  to  be  tested  first.  The  final  stage  of  testing  is  system  testing.  Here  the  system  as  a  whole  is 
examined  to  determine  whether  it  meets  its  specified  requirements. 

The  approved  DOD  model  for  software  reviews  and  audits  is  shown  in  Figure  1-1,  reproduced  from 
DOD  Military  Standard  2167A  [DOD88],  entitled  Defense  System  Software  Development.  Here  com¬ 
puter  configuration  items  (CSCIs)  are  partitioned  into  computer  software  components  (CSCs),  which 
may  themselves  be  decomposed  into  further  CSCs  and  computer  software  units  (CSUs).  DOD-STD- 
2167A  specifies  the  products  required  from  each  development,  review,  audit,  and  testing  activity, 
together  with  evaluation  criteria  for  major  products.  Although  these  evaluation  criteria  do  address  such 
issues  as  traceability,  understandability,  and  test  coven^e  of  requirements,  they  are  largely  subjective  cri¬ 
teria  questioning  the  use  of  appropriate  techniques  or  adequate  test  coverage  and  do  not  require  meas¬ 
urement  of  quantifiable  properties. 

Major  defense  acquisition  programs  require  a  Test  and  Evaluation  Master  Plan  (TEMP).  This  docu¬ 
ment  is  used  by  the  Office  of  the  Secretary  of  Defense  (OSD)  and  all  DOD  components  for  oversight  of 
test  and  evaluation  (T&E)  activities.  It  is  the  basic  planning  document  for  all  T<S£  related  to  a  particular 
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Figure  1-1.  DOD  2167A  -  An  Example  of  System  Development  Reviews  and  Audits 
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system  acquisition,  addressing  T&£  of  both  hardware  and  software  elements.  It  specifies  required 
technical  and  operational  characteristics  for  the  system,  the  T&£  responsibilities  of  all  participating 
organizations,  a  timing  sequence  for  T&E  activities,  and  necessary  T&£  resources.  As  a  development 
effort  proceeds,  the  conduct  and  results  of  T&E  activities  are  included. 

H’aditionally,  a  software  TEMP  discusses  T&E  activities  in  the  context  of  a  waterfall  software 
development  life  cycle.  These  activities  fall  into  two  categories.  Developmental  Test  and  Evaluation 
(DT&E)  addresses  attainment  of  critical  technical  characteristics  such  as  correctness  and  certifiability. 
Operational  Test  and  Evaluation  (OT&E),  on  the  other  hand,  is  concerned  with  the  operational 
effectiveness  and  suitability  that  are  critical  to  the  system’s  mission.  OT&E  may  be  preceded  by 
Qualification  Testu^  (QT)  and  Initial  Operational  Test  and  Evaluation  (lOT&E).  Once  the  system  is 
deployed,  it  maybe  continued  with  Follow-on  Operational  Test  and  Evaluation  (FOT&E). 

It  is  generally  good  practice  to  have  test  teams  which  are  independent  of  the  software  developers.  This 
prevents  the  developers  from  propagating  any  misconceptions  of  the  software  through  to  testing  and 
evaluation  acdvities.  It  also  allows  for  distinguishing  between  the  different  thought  processes  and  skills 
required  in  testing;  a  good  software  developer  is  not  necessarily  a  good  tester,  and  vice  versa.  Many 
DOD  projects  require  software  to  be  tested  and  evaluated  by  an  independent  verification  and  validation 
(TV&V)  organization  prior  to  acceptance.  Ideally,  the  IV&V  organization  is  provided  by  the  eventual 
software  support  agency. 

1.3  State-of'the-Practice  in  Software  Testing  and  Evaluation 

Testing  and  evaluation  have  long  been  regarded  as  one  of  the  weakest  areas  in  the  development  of 
software  systems.  The  structured  programming  movement  started  by  Dijkstra  in  the  1960’s  [Dijk76a]  was 
motivated  by  a  desire  to  improve  software  quality,  specifically  by  espousing  development  of  demonstr¬ 
ably  correct  code  from  rigorous  specifications.  This  movement  Im  since  grown  into  a  much  wider 
activity  and  the  focus  has  broadened  to  address  the  introduction  of  increased  rigor  and  structure  into 
early  life  cycle  activities.  These  efforts  have  led  to  considerable  advances  in  testing  and  evaluation  tech¬ 
nology  over  the  past  decade.  Unfortunately,  these  advances  have  not  been  matched  by  a  corresponding 
improvement  in  the  state-of-the-practice.  It  is  not  uncommon  for  intrinsic  flaws  to  become  apparent  late 
in  the  development  of  a  system,  necessitating  the  discard  of  efforts  that  have  already  consumed  substan¬ 
tial  resources.  Even  worse,  relatively  trivial  failures,  often  arising  from  unforeseen  combinations  of  cir¬ 
cumstances,  have  caused  several  medical,  automotive,  aeronautical,  and  defense  systems  to  result  in  the 
loss  of  life  [Neum87]. 

The  typical  practice  in  software  testing  and  evaluation  revolves  around  a  software  developer’s  intui¬ 
tion.  As  part  of  the  DOD’s  Software  Test  and  Evaluation  Project  (STEP),  a  review  of  current  practices 
in  twelve  development  and  IV&V  organizations  undertaking  DOD  software  efforts  was  performed 
[DeMi87a].  By  and  large,  the  only  formalized  techniques  used  in  unit  testing  were  structural  testing  (to 
ensure  that  a  given  percentage  of  statements  or  control  paths  were  executed  during  testing),  exercising 
the  code  with  extremal  and  special  values,  and  manual  reviews.  Only  one  development  organization  used 
functional  testing  (to  examine  the  conformance  of  a  software  implementation  to  its  specification),  and 
only  one  IV&V  organization  used  metrics  to  evaluate  critical  software  properties.  Integration  and  system 
testing  primarily  consisted  of  functional  testing,  with  only  one  development  organization  performing  reli¬ 
ability  assessment.  There  was  minimal  use  of  automated  tools  to  reduce  the  traditionally  labor-intensive 
nature  of  testing.  The  only  tools  used  by  three  or  more  organizations  were  file  comparators,  analyzers  or 
code  auditors,  and  dynamic  execution  verifiers.  Although  the  STEP  report  was  prepared  in  1981  (it  was 
slightly  revised  in  1987)  these  findings  still  reflect  current  practices. 


4 

UNCLASSIFIED 


UNCLASSIFIED 


The  SDIO  must  ensure  that  all  contractors  developing  SDS  software  use  the  best  available  technol¬ 
ogy.  This  cannot  be  achieved  by  simply  imposing  contractual  requirements  on  software  developers. 
Software  developers  must  be  provided  incentives  to  adopt  improved  practices.  To  this  end,  the  technol¬ 
ogy  must  first  be  applied  to  real  software  efforts  to  prove  its  utility  and  demonstrate  the  benefits  that 
accrue,  and  the  results  of  these  efforts  widely  disseminated.  Furthermore,  a  rigorous  testing  and  evalua¬ 
tion  approach  can  only  be  introduced  into  widespread  practice  when  the  automated  tools  that  support  its 
application  are  available. 

1.4  The  IDA  Testing  and  Evatoation  Workshop 

Although  several  weak  areas  in  technology  are  clearly  obvious,  it  is  difficult  to  determine  which  con¬ 
stitute  critical  shortcomings  where  intensive  R&D  over  the  next  few  years  can  be  expected  to  result  in 
practically  applicable  technology  that  can  be  exploited  for  Full  Scale  Development  (FSD)  SDS  software. 
The  leadii^  researchers  in  the  various  areas  of  testing  and  evaluation  technology  were  invited  to  join  in  a 
workshop  to  address  this  issue.  Prior  to  the  workshop,  these  researchers  were  requested  to  provide  posi¬ 
tion  papers  outlining  their  views.  Since  several  of  the  researchers  are  conducting  promising  research 
efforts  ffiscussed  later  in  this  report,  they  were  also  asked  to  provide  explicit  data  on  the  current  status  of 
these  efforts  and  describe  how  usefiil  they  expect  this  evolving  technology  to  be  for  SDS  purposes.  The 
resulting  collection  of  papers  is  presented  in  IDA  Memorandum  M-513  [Bryk89].  These  researchers 
were  asked  to  review  an  earlier  version  of  this  document.  This  version  reflects  the  inputs  and  comments 
received  from  the  workshop  attendees. 

In  addition  to  researchers  from  academia,  private  companies,  and  DOD  R&D  centers,  representa¬ 
tives  from  DOD  organizations  involved  in  Strategic  Defense  Initiative  Organization  (SDIO)  software 
efforts  were  invited  to  attend  the  workshop.  The  complete  list  of  participants  is  provided  in  Appendix  C. 
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2.  SDS  SOFTWARE  TESTING  AND  EVALUATION  NEEDS 

The  SDS  possesses  characteristics  which  stress  the  application  of  current  testing  and  evaluation  tech¬ 
nology.  While  these  characteristics  are  individually  not  unique,  never  before  have  they  been  combined  in 
such  a  large  and  safety-critical  application.  The  concerns  arising  from  these  characteristics  have  long 
been  recognized  and  the  overall  SDS  development  approach  has  been  designed  to  facilitate  software 
testing  and  evaluation. 

This  section  reviews  these  special  characteristics  and  outlines  the  SDS  development  approach. 
Against  this  background,  specific  objectives  for  SDS  software  testing  and  evaluation  technology  are 
identified  and  a  practical  mechanism  for  integrating  testing  and  evaluation  more  fully  into  sof^are 
development  activities  presented.  Finally,  related  ongoing  SDIO  testing  and  evaluation  activities  are 
identified. 

2.1  Impact  of  SDS  Characteristics  on  Testing  and  Evaluation 

The  SDS  is  conceived  as  a  multi-layered  defense  that  destroys  attackers  in  their  boost,  mid-course, 
and  terminal  phases.  Some  battle  management  functions,  such  as  detection,  acquisition  and  tracking, 
classification,  and  resource  allocation  are  common  to  each  layer  of  the  defense.  On  a  more  global  level, 
there  are  functions  which  occur  in  all  phases  of  battle  or  span  multiple  phases  of  battle  management. 
Examples  in  this  case  include  surveillance,  engagement,  and  situation  assessment  functions.  Con¬ 
currency  will  be  used  widely,  both  to  coordinate  subsystems  that  are  distributed  geographically  and  spa¬ 
tially,  and  to  provide  the  computational  power  needed  on  a  single  platform.  The  Battle 
Management/Command,  Control,  and  Communication  (BM/C3)  system  will  have  to  integrate  a  software 
system  significantly  larger  than  any  previous  system.  The  characteristics  of  weapons  and  sensors  are  yet 
unknown  and  may  remain  fluid  for  several  years.  Consequently,  system  components  will  be  subject  to 
independent  modification  with  possibly  changing  interfaces.  W^e  the  system  is  expected  to  idle  during 
most  of  its  operational  life,  in  an  engagement  it  must  operate  under  extreme  and  inflexible  real-time  con¬ 
straints.  Battle  characteristics,  and  the  SDS  components  available  at  that  time,  cannot  be  predicted  with 
any  certainty. 

The  system  will  be  dynamically,  as  opposed  to  statically,  linked.  This  is  necessary  (1)  because  of  the 
constant  physical  movement  of  system  components,  (2)  to  allow  for  rapid  reconfiguration  of  the  system 
in  response  to  enemy  countermeasures,  and  (3)  to  compensate  for  the  loss  of  components  that  can  be 
expected  in  a  hostile  operating  environment.  The  system  itself  is  expected  to  be  developed  and  deployed 
in  an  evolutionary  manner,  with  each  version  of  the  system  meeting  different  requirements.  These 
requirements  will  evolve  to  provide  increasing  functionality,  respond  to  major  advances  in  technology 
employed  in  (or  countered  by)  the  SDS,  and  reflect  changes  in  the  political  arena. 

2.1.1  Concurrent,  Distributed  Software 

Concurrent  software,  whether  with  apparent  parallelism  on  a  uniprocessor  or  actual  parallelism  on 
multiprocessor  or  distributed  architectures,  is  subject  to  types  of  failure  which  do  not  occur  with  sequen¬ 
tial  software.  These  failures  arise  from  problems  with  the  synchronization  and  communication  between 
processes.  Examples  include  deadlock  and  the  simultaneous  update  of  shared  variables. 

In  the  initial  testing  process  individual  concurrent  processes  can  be  treated,  in  some  senses,  as 
independent,  sequential  programs.  When  the  time  comes  to  test  the  combined  run-time  behavior  of  the 
processes,  however,  the  non-determinism  inherent  in  their  concurrent  behavior  results  in  additional 
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testing  problems.  When  several  processes  execute  with  pseudo  parallelism  on  a  single  processor,  for 
example,  the  scheduler  may  make  different  choices  about  the  order  of  execution  in  response  to  condi¬ 
tions  external  to  a  program.  Consequently,  a  program  supplied  with  the  same  set  of  test  data  on  two 
different  executions  can  exhibit  markedly  different  behavior.  Moreover,  if  a  program  is  ported  across 
environments,  the  new  environment  may  employ  an  entirely  different  scheduling  algorithm.  It  may  even 
include  different  processor  construction  which  invalidates  previous  testing,  for  example,  a  different 
real-time  clock. 

The  non-determinism  of  concurrent  software  considerably  increases  the  difficulty  of  detecting  and 
correcting  errors  and  faults  dependent  on  the  relative  scheduling  of  events  and  resources.  Special  tech¬ 
niques  and  tools  are  required  to  mitigate  these  problems.  Even  relatively  straightforward  tasks,  such  as 
monitoring  testing  coverage,  become  more  complex.  In  this  example,  new  types  of  coverage  measures 
are  required  to  reflect  the  coverage  achieved  for  both  individual  processes  and  synchronization  activities. 
The  instnunentation  needed  to  coUect  coverage  information  can  distort  the  execution  of  the  program, 
thus  introducing  a  complicating  factor  which  exacerbates  the  non-determinism  problem. 

2.1.2  Real-Ume  Software 

The  principal  distinction  of  a  real-time  system  is  that  the  physics  of  the  application  imposes  time  con¬ 
straints  on  some  of  the  computations.  These  time  constraints  are  not  merely  performance  metrics  hut  a 
correctness  property  for  the  computations.  In  other  words,  performance  is  a  critical  factor.  The 
software  functionality  cannot  be  tested  independently  of  performance  and,  of  course,  this  is  dependent 
upon  the  execution  environment.  In  systems  where  the  requested  processing  in  a  given  time  period  is 
likely,  at  times,  to  exceed  available  computation  power,  the  issues  of  timeliness  and  importance  are 
tightly  coupled.  Testing  must  demonstrate  that  performance  has  been  optimized  for  the  most  important 
cases.  In  command,  control,  and  communication  systems,  such  as  the  SDS  BM/C3,  these  important 
cases  are  typically  the  exceptional  ones,  not  the  most  frequent  ones. 

Process  control  systems  are  good  examples  of  real-time  systems.  These  systems  are  required  to 
respond  to  inputs  within  restricted  time  constraints  to  control  ongoing  external  processes.  There  is 
inherent  non-determinism  in  the  relative  scheduling  of  events  that  arises  from  the  varying  order  in  which 
inputs  may  occur.  As  with  concurrent  software,  this  non-determinism  poses  problems  for  testing  and 
evaluation.  The  rigorous  timing  constraints  inherent  to  all  real-time  systems  give  rise  to  a  further 
difficulty;  namely,  any  instrumentation  of  code  has  a  significant  impact  on  the  software  performance  and 
considerably  complicates  the  analysis  and  repeatability  of  test  results. 

Since  the  execution  environment  is  typically  highly  specialized,  real-time  systems  are  usually 
developed  on  a  different  machine  (the  host)  than  that  on  which  they  will  operate  (the  target).  This  allows 
the  host  environment  to  provide  development  tools  which  may  be  unavailable  in  the  target  environment. 
Testing  and  evaluation  are  performed  on  both  machines.  While  testing  practices  vary  from  one  project  to 
another,  it  is  common  for  testing  on  the  host  to  emphasis  unit  testing  using  the  standard  techniques 
employed  for  non  real-time  applications.  Integration  testing  may  also  be  performed,  but  later  stages  of 
integration  testing  require  an  environment  simulator.  Unit  and  integration  testing  are  repeated  on  the  tar¬ 
get  machine,  along  with  system  testing.  Here  again,  an  environment  simulator  may  be  required.  If  the 
target  machine  has  no  debugging  facilities,  failures  occurring  during  target  testing  may  require  recon¬ 
structing  the  test  in  question  on  the  host.  This  is  exceptionally  difficult  and  not  always  possible. 

Although  the  limitations  of  the  target  environment  may  necessitate  testing  on  the  host,  differences 
between  these  environments  may  cast  doubt  on  the  validity  of  the  testing. 
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2.1.3  Dynamic  Ekivironments  and  Fault  Tolerance 

Dynamic  linkup  of  subsystems  and  components  introduces  another  level  of  uncertainty.  If  all  possible 
reconfigurations,  and  the  operational  requirements  which  apply  to  each,  could  be  explicitly  identified,  it 
would  still  be  infeasible  to  fully  test  each  instantiation  of  the  system.  Nevertheless,  techniques  for 
r^orously  analyzing  the  &ctors  which  may  necessitate  reconfiguration  and  determining  how  the  system 
should  respond  to  each  factor  individually  or  in  combination  must  be  developed.  Innovative  testing  and 
evaluation  approaches  are  needed  to  provide  the  greatest  possible  coverage  of  the  diverse  environments 
and  operating  modes  expected  to  be  encountered.  In  particular,  all  safety  critical  conditions  leading  to, 
or  arising  from,  reconfiguration  must  be  identified. 

One  of  the  uses  of  dynamic  environments  is  to  increase  the  fault  tolerance  of  a  system.  The  infeasibil¬ 
ity  of  routine  maintenance  and  repair  of  space-based  components,  and  lack  of  time  for  any  repair  activi¬ 
ties  during  an  engagement,  mandate  that  SDS  make  extensive  use  of  fault  tolerance.  Testing  and  evalua¬ 
tion  of  fault  tolerant  systems  is  not  a  well-understood  process.  Although  some  aspects,  such  as  the  neev 
to  drive  the  system  to  failure,  are  recognized,  they  present  additional  testing  challenges  for  the  SDS. 

At  a  more  software  specific  level,  fault  tolerance  poses  additional  questions.  Consider  for  a  moment 
N-version  programming,  which  is  one  of  the  best  known  techniques  for  increasing  software  fault  toler¬ 
ance.  The  basic  premise  of  N-version  programming  is  that  different  software  developers  are  likely  to 
introduce  different  faults  into  the  software.  Therefore,  a  large  number  of  versions  of  a  piece  of  software 
are  developed  independently  and  executed  together.  The  results  from  all  the  versions  are  compared  and 
the  consensus  assumed  to  be  correct.  Some  researchers  hold  such  high  expectations  of  the  inherent 
tolerance  of  faults  in  the  different  versions  as  a  group,  that  they  claim  individual  versions  need  less  test¬ 
ing  than  usual  [AvizSS].  Unfortunately,  recent  experiments  [KnigS6a]  indicate  that  the  necessary  statisti¬ 
cal  independence  of  faults  rarely  occurs  in  practice,  undermining  the  fundamental  assumption  of  In¬ 
version  programming.  Until  there  are  approaches  for  analyzing  the  fault  independence  achieved  in  any 
particulv  application,  and  determining  the  resultant  impact  on  reliability,  this  technique  must  be  used 
with  care.  Si^ar  cautions  apply  to  other  software  fault  tolerance  techniques. 

2.1.4  In-Line  Testing 

The  inability  to  perform  full-scale  operational  testing  in  its  deployed  environment  is  one  of  the  most 
frequently  stated  arguments  against  the  feasibility  of  the  SDS.  It  is  a  legitimate  concern.  Even  ignoring 
the  various  political  constraints,  there  are  very  real  technical  and  economic  prohibitions  against  such 
full-scale  testing.  It  would  be  very  visible,  providing  potential  adversaries  with  information  which  could 
be  used  to  negate  the  usefulness  of  the  system.  Moreover,  the  validity  of  the  testing  would  be  short-lived, 
outdated  by  even  minor  changes  in  battle  characteristics  that  are  not  under  U.S.  control. 

Nevertheless,  the  system  must  be  designed  to  facilitate  in-line  testing  once  deployed.  Although  the 
bulk  of  the  testing  will  be  performed  prior  to  deployment,  some  testing  will  subsequently  be  necessary  if 
only  to  validate  assumptions  made  in  the  earlier  testing,  or  diagnose  failures  reported  by  monitoring 
processes.  The  prime  difficulty  here  is  that,  once  deployed,  the  system  must  operate  continually. 
Although  most  system  components  wiU  be  inactive  for  an  indefinite  period,  the  SDS  must  be  ready  to 
respond  instantly  to  any  threat.  The  capability  for  in-line  software  testing  has  to  be  designed  into  the  sys¬ 
tem. 

2.1.5  Evolving  Requirements 

The  impact  of  continually  changing  requirements  on  software  testing  and  evaluation  is  more  pragmatic 


9 

UNCLASSinED 


UNCLASSIFIED 


than  the  concerns  so  far  raised.  The  problem  is  largely  an  economic  one;  the  need  to  perform  ^pensive 
retesting  after  a  change  is  made.  Planning  for  such  regression  testing  is  often  forgotten  during  system  and 
software  en^eerii^  activities  when  developers  are  already  faced  with  considerable  technical  challenges. 
If,  however,  the  retest  concern  is  not  addressed  in  a  timely  manner,  the  result  will  be  software  which, 
when  modified,  requires  retesting  out  of  all  proportion  to  the  scale  of  the  change.  It  would  rapidly 
become  economically  infeasible  to  keep  such  a  system  responsive  to  changing  threats. 

The  amount  of  regression  testing  required  after  a  software  change  must  be  (1)  predictable,  and  (2) 
proportional  to  the  size  of  the  change.  'Daceability  from  system  requirements  to  later  development  pro- 
ducts  is  not  sufficient  to  ensure  this.  Good  en^eering  practices,  such  as  information  hiding,  must  be 
employed  with  regression  testii^  specifically  in  mind,  at  the  earliest  stages  of  system  development. 
Regression  testing  will  also  be  facilitated  by  maintaining  good  records  of  testing  activities.  Since  it  is 
unrealistic  to  expect  that  full  details  about  all  testing  events  can  be  stored,  a  mechanism  for  identifying 
and  structuring  pertinent  details  must  be  developed. 

Revisions  to  system  requirements  are  not  necessarily  confined  to  new  versions  of  t'ue  system.  For 
example,  a  chaise  in  the  attack  capabilities  of  adversaries  may  require  immediate  modifications  to  the 
system.  The  speed  with  which  these  modifications  can  be  implemented  and  validated  will  be  critical  since 
the  system  may  be  unable  to  fulfill  its  mission  in  the  interim  and  would  itself  be  increasingly  vulnerable  to 
attack.  These  support  activities  cannot  be  left  to  less  experienced  software  developers,  as  is  often  the 
case.  Skilled  software  developers  who  have  an  intimate  knowledge  of  the  system  must  be  available  to 
devote  their  immediate  attention  to  implementing  necessary  changes. 

2,2  The  SDS  Development  and  Testing  Paradigm 

As  is  apparent  firom  the  preceding  discussion,  consideration  of  software  testing  and  evaluation  cannot 
be  postponed  until  the  sof^are  is  designed.  It  must  be  an  integral  part  of  the  whole  development  pro¬ 
cess.  The  SDIO  has  begun  to  act  upon  this  insight,  and  some  initial  decisions  have  already  been  made. 

2.2.1  The  SDS  Software  Development  Approach 

The  SDS  software  development  approach  will  integrate  design,  prototyping,  and  simulation  to  allow 
formulation  of  SDS  requirements  to  be  tied  to  early  analyses  of  the  effectiveness  and  suitability  of  alter¬ 
native  architectures.  A  key  ingredient  of  this  approach  is  the  use  of  formal  specifications  for  representa¬ 
tion  of  SDS  architectures.  Formal  specifications  offer  several  benefits.  They  help  to  ensure  that  system 
interfaces  and  functions  are  cleanly  and  modularly  separated,  thus  improving  the  testability  of  the  system 
as  a  whole.  They  are  also  a  prerequisite  for  all  form^  testing  and  evaluation  approaches,  including  test¬ 
ing  formal  specifications  and  proving  formal  properties  about  the  specifications.  One  of  the  most 
significant  benefits,  however,  accrues  from  their  potential  for  simulation.  The  SDIO  has  developed  a  for¬ 
mal  notation,  called  the  Strategic  Defense  Initiative  Architecture  Dataflow  Modeling  Technique 
(SADMT)  [LinnSS],  to  otploit  this  potential.  This  notation  is  designed  to  meet  the  specific  needs  of  the 
SDS  and  BM/C3  architectures.  It  represents  architecture  designs  in  such  a  way  that  they  can  be  directly 
input  to  simulation. 

Expanded  capabilities  of  this  nature  are  being  developed  for  the  National  Test  Bed  (NTB)  where 
requirements  range  from  high-level  system  simulation  through  full-fidelity  simulation  of  individual  com¬ 
ponents.  This  approach  supports  the  detection  of  requirement  and  design  errors  early  in  the  develop¬ 
ment  process  when  their  correction  is  easy  and  does  not  require  undoing  much  completed  work. 


10 

UNCLASSIFIED 


UNCLASSIFIED 


Earty  operational  testing  is  a  natural  consequence  of  the  prototypii^  approach  and  timely  analysis  and 
comparison  of  architectural  models  is  possible.  Prototypes  can  be  exercised  to  determine,  for  example, 
the  impact  of  various  damage  scenarios  on  the  ability  of  an  architecture  to  continue  functioning.  Results 
can  be  refined  and  expanded  throi^  an  eventual  fi^-scale  engineering  decision  and  deployment.  This 
allows  the  system  and  its  components  to  be  tested  and  evaluated  over  an  extended  period  of  time  and 
under  a  wide  variety  of  operati^  modes,  conditions,  and  environments. 

The  feasibility  of  building  a  testable  SDS  is  tied  to  the  choice  of  an  SDS  architecture.  The  SDS  has 
been  described  as  a  system  that: 

"...  would  need  to  respond  to  an  offensive  strike  as  a  single  organism,  coordinating 
perhaps  millions  of  separate  actions  in  a  schedule  timed  in  milliseconds.”  [Adam85] 


System  architectures  that  require  such  coordination  between  elements  demand  excessively  sophisticated 
software  and  cannot  be  adequately  tested.  The  complex  interaction  of  the  units  or  components  in  such 
an  architecture  means  that  the  test  of  an  individual  unit  will  provide  little  assurance  about  the  adequacy 
of  functioning  for  the  system  as  a  whole.  Consequently,  distributed,  decentralized  architectures  which 
reduce  the  complexity  of  the  software  by  eliminating  needless  coordination  have  been  chosen.  Indepen¬ 
dence  allows  clear  design  and  separation  of  functions  and  explicit  specification  of  the  interfaces  among 
the  functions.  This  in  turn  allows  use  of  implicit  coordination  schemes  where  elements  can  act  indepen¬ 
dently  based  on  their  limited  knowledge  of  the  status  of  other  elements  and  how  these  are  anticipated  to 
behave.  Of  course,  achieving  a  good,  distributed,  decentralized  architecture  is  not  easy.  It  requires  an 
effective  modular  decomposition  of  the  system  althoi^  there  is  little  experience  in  how  to  accomplish 
this  and  how  to  evaluate  the  results.  Nevertheless,  the  use  of  decentralized  architectures  makes  testing 
SDS  analogous  to  testing  current  offensive  weapons.  In  addition  to  increasing  the  testability  of  the 
overall  system,  concepts  such  as  these,  that  reduce  the  complexity  of  required  software,  also  increase 
system  security,  evolvability,  and  robustness. 

2.2.2  Use  of  the  Ada  Programming  Language 

In  accordance  with  DOD  Directives  3405.1  and  3405.2  [DODD87c,DODD87b],  the  SDIO  has 
adopted  a  policy  requiring  all  software  to  be  developed  in  Ada.  This  required  use  of  a  single  program¬ 
ming  langui^e  provides  an  enormous  advantage  for  testing  and  evaluation;  namely,  it  allows  a  common 
set  of  code-level  testing  and  evaluation  techniques  and  tools  to  be  used  across  all  SDS  software  efforts. 
This  makes  the  software  developers’  work  easier,  facilitates  the  introduction  of  state-of-the-art  technol¬ 
ogy,  and  increases  the  cost-effectiveness  of  necessary  tool  development. 

Ada  was  specifically  developed  to  support  “programming-in-the-large.”  Productivity  and  maintenance 
issues  were  key  drivers  in  its  definition.  Savings  both  through  increased  productivity  and  decreases  in 
maintenance  cost  are  expected  from  the  use  of  Ada.  The  techniques  for  interface  specification  and  the 
constructs  for  modularity  and  information  hiding  of  Ada  have  been  specially  designed  to  support  state- 
of-the-art  software  development  methods  and  technology,  including  the  development  of  portable  and 
reusable  code. 

The  choice  of  Ada  as  the  required  programming  language  offers  further  advantages.  Ada  combines 
advantages  of  many  previous  separate  langui^es,  such  as  strong  typing  and  data  abstraction.  Of  course, 
techniques  for  evriuating  conformance  to  proper  Ada  coding  practices  are  necessary  to  ensure  that 
strong  typing  and  other  desired  practices  are  exploited  to  take  maximum  advantage  of  the  language. 
Through  the  use  of  predefined  and  user-defined  exceptions,  Ada  also  provides  a  basic  capability  for  run- 
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time  detection  and  recovery  from  certain  types  of  &ults.  Additionally;  the  Ada  Compiler  validation 
effort  has  resulted  in  an  impressive  improvement  in  code  portability  across  widely  differing  machines  (in 
comparison  with  other  languages).  Fuither  advantages  in  code  reuse  from  the  combination  of  portability, 
abstract  interfaces,  and  generics,  are  becoming  more  evident  as  Ada  experience  increases,  especially  in 
the  area  of  constructing  large  systems. 

One  of  the  major  challei^es  facing  the  SDS  is  that  of  producing  trusted  software.  While  Ada  alone 
does  not  ensure  the  development  of  trusted  software,  it  does  provide  language  features  important  to  the 
implementation  and  verification  of  security  attributes. 

2.3  Integrating  Testing  and  Evaluation  into  Development  Activities 

All  too  frequently,  software  testing  and  evaluation  are  regarded  as  tack-on  activities  to  be  performed 
after  code  has  been  written.  A  software  product  is  developed  and  then  testing  and  evaluation  performed 
as  a  discrete  step  to  get  the  “bugs”  out.  Herein  lies  a  fundamental  misconception  that  must  be  eliminated 
before  significant  increases  in  software  reliability  can  be  achieved.  Reliability  cannot  be  tested  into 
software.  Instead,  development  must  be  seen  as  an  error  prevention  activity,  with  testing  and  evaluation 
providing  continual  feedback  on  the  validity  of  the  current  activity  in  its  own  right,  and  its  implications 
for  subsequent  activities.  Only  then  will  the  increased  visibility  into  the  development  process  necessary 
for  timely  identification  of  factors  which  impact  testability  and  early  diagnosis  of  problem  areas  be  possi¬ 
ble. 

Consideration  of  testing  and  evaluation  concerns  must  be  brought  forward  to  the  earliest  development 
activities;  this  is  true  at  both  the  system  and  software  levels.  For  example,  the  time  to  develop  the  system 
test  plan  is  when  system  requirements  are  defined.  If  system  requirements  are  accompanied  by  details  on 
how  the  conformance  of  the  final  system  with  those  requirements  will  be  determined,  untestable  require¬ 
ments  or  those  whose  testing  incurs  unacceptable  costs  can  be  immediately  identified;  preventing  mil¬ 
lions  of  dollars  beii^  invested  in  an  undeployable  system.  In  the  same  way,  the  integration  test  plan 
should  be  developed  during  architectural  des^  activities.  Usually,  the  unit  test  plan  would  be  developed 
during  detailed  design  and  coding  of  the  software,  leading  to  an  overall  relationship  between  test  plan 
elements  as  shown  in  Figure  2-1^. 

This  figure  contains  some  simplifications.  For  a  system  the  size  and  complexity  of  SDS,  for  example,  at 
least  one  additional  level  of  test  planning  is  necessary,  between  integration  and  unit  testing.  Addition¬ 
ally,  only  the  major  flows  between  activities  and  products  are  shown^  the  additional  flows  necessary  to 
establish  the  traceability  of  testing  requirements  have  been  omitted. 

Figure  2-1  provides  the  starting  point  for  an  SDS  software  testing  and  evaluation  process  model.  It 
demonstrates  a  practical  mechanism  for  the  integration  of  testing  and  evaluation  concerns  into  develop¬ 
ment  activities.  The  model  shows  the  necessary  planning  for  testing  final  code  products.  It  also  indicates 
the  need  to  test  intermediate  products.  Experience  has  repeatedly  shown  that  it  is  most  effective  to 
detect  and  correct  an  error  early  in  the  lifecycle;  the  cost  of  detecting  and  correcting  software  faults 
increases  by  a  factor  of  100  or  more  as  the  system  is  integrated  [BoehSl].  At  each  stage  in  the 


1.  This  model  of  softwue  processes  for  developing  and  testing  SDS  software  was  first  proposed  by  Prof.  L.J.  Osterweil  from  the 
University  of  California  at  Irvine.  It  was  subsequently  elaborated  by  Prof.  Osterweil  and  Prof.  L.A.  Clarke  (firom  the  University 
of  Massachusetts). 
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development  cycle,  testing  requirements  for  the  next  stage  must  be  specified.  Of  course,  during  the  fol¬ 
lowing  stage,  the  software  developers  will  accumulate  additional  testi^  requirements  for  the  products  of 
that  stage.  Ihe  initial  testing  requirements  passed  on  from  the  previous  st^e  will,  however,  ensure  that 
previously  identified  concerns  are  addressed  in  a  timely  manner. 

Testing  requirements  must  be  traceable  through  the  entire  system  development  process.  Conse¬ 
quently,  they  must  be  maintained  as  permanent  attributes  of  the  software  and  orpressed  in  a  formal  nota¬ 
tion  which  permits  the  necessary  analysis  of  consistency  and  completeness.  Additionally,  these  require¬ 
ments  must  be  stated  in  explicit  and  measurable  terms  against  which  testing  and  evaluation  activities  can 
be  monitored  and  analyzed  to  determine  the  effectiveness  of  testing-to-date  and  identify  any  outstanding 
testing  needs.  Thus,  testing  requirements  are  envisioned  as  driving  the  testing  at  each  stage  and  providing 
a  mechanism  for  integrating  testing  and  evaluation  activities  into  an  overall  testing  strategy. 


Figure  2-1.  Preliminary  SDS  Software  Testing  and  Evaluation  Process  Model 

What  is  most  important  about  this  proposed  process  model  is  not  the  specific  process  itself.  Rather 
that  this  process  is  an  example  of  the  flexibility  in  performing  testing  and  evaluation  that  arises  from  the 
principle  of  software  process  modeling  and  the  adoption  of  software  process  programming  as  the  basis 
for  software  development  specification  and  management.  The  particular  development  and  testii^  model 
discussed  in  this  report  is  intended  largely  to  indicate  the  innovative  ways  in  which  testing  and  evaluation 
could  be  proposed,  implemented,  and  experimentally  evaluated  if  process  modeling  in  general,  and  pro¬ 
cess  programming  in  particular,  are  understood  and  exploited.  SDS  software  development  efforts  should 
adopt  the  principle  of  software  process  programming  and  use  it  as  the  basis  for  explorii^  effective 
software  life  cycle  modeling  and  as  the  basis  for  realizing  the  kinds  of  flexibility  that  will  be  needed. 

In  view  of  the  existing  lack  of  research  results,  experimental  technologies,  and  finished  products,  the 
SDIO  must  expect  that  it  will  have  to  adopt  and  assimilate  techniques  and  tools  of  unexpected  types  and 
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varieties.  If  a  specific,  fixed  and  rigid  software  development  process  is  assumed,  it  is  certain  that  the 
assiniilation  of  these  new  techniques  and  tools  will  be  awkward,  if  not  impossible.  Thus  development  life 
cycle  flexibility  is  essential. 

It  is  important  to  note  that  the  ability  to  perform  rigorous  testing  and  evaluation  on  intermediate  pro¬ 
ducts  is  dependent  on  the  use  of  formal  notations  at  all  stages  of  development.  Furthermore,  the  need  to 
provide  timely  feedback  on  development  activities,  particularly  for  huge  systems,  implies  the  need  for 
incremental  testing  and  evaluation  performed  on  possibly  incomplete  products  at  each  development 
stage.  As  not' d  by  Clarke  [ClarSSa],  although  systems  start  out  incomplete  and  are  developed  incremen¬ 
tally,  most  of  the  tools  implementing  current  techniques  work  on  complete  software  representations. 
This  handicap  must  be  resolved.  Finally,  since  no  sin^e  technique  is  sufficient  at  any  one  stage,  testing 
and  evaluation  techniques  must  be  applied  cooperatively.  Simply  applyii^  one  technique  sequentially 
after  another  yields  neither  increased  nor  efficient  fault  detection.  Instead,  different  testing  techniques 
must  be  ti^tly  integrated.  Quantitative  information  on  the  capabilities  of  particular  techniques  when 
applied  to  different  types  of  software  is  also  needed. 

Many  issues  remain  to  be  resolved.  What  information  should  be  captured  in  the  various  test  plan  ele¬ 
ments  and  in  testing  requirements?  How  can  the  necessary  level  of  confidence  in  results  dictated  by  the 
possible  impact  on  mission  performance  and  other  consequences  of  failure  be  specified?  What  is  the 
minimum  synopsis  of  testing  activities  sufficient  to  facilitate  retesting?  Tasks  to  address  these,  and  other, 
questions  are  discussed  in  Sections  8  and  9. 

2.4  Related  On-Going  Activities 

Before  proceeding,  it  is  useful  to  note  some  of  the  other  SDIO  testing  and  evaluation  activities  that 
are  underway.  While  by  no  means  an  exhaustive  list,  the  following  identifies  those  activities  relevant  to 
the  purpose  of  this  report. 

2.4.1  Organizational  Activities 

The  SDIO  T&E  Directorate  has  established  an  SDS  T&E  Working  Group.  One  of  the  tasks  being  ini¬ 
tiated  by  this  group  is  an  activity  designed  to  address  outstanding  policy,  organizational,  and  logistics 
issues  for  SDS  sof^are  testing  and  evaluation.  For  example,  the  NTB  will  be  the  focal  point  for  element 
and  system  level  testing  and  evaluation  and  provide  a  communications  network  linking  geographically 
distributed  facilities  within  the  National  Test  Facility  (NTF).  One  of  the  goals  of  this  group  is  to  deter¬ 
mine  whether  the  NTB  should  be  the  IV&V  organization  for  SDS  software. 

The  Simulation  Engineering  Group  has  been  established  as  an  advisory  group  to  the  SDIO  on  the  sub¬ 
ject  of  simulation  and  system  performance  evaluation.  It  has  brought  together  the  leading  researchers  in 
this  area  to  review  ongoing  research.  A  working  group  of  this  panel  is  currently  developing  requirements 
for  the  Advanced  Simulation  Framework  of  the  NTB. 

2.4.2  Software  Center 

The  SDIO  is  establishing  a  Software  Center  which  will  provide  a  focal  point  for  software  technology 
awareness  across  the  development  activities  in  the  SDS  by  an  extensive  training  program  for  senior 
software  engineers.  This  training  will  also  address  the  SDS  development  approach,  for  example, 
software  engineering  environments,  and  software  methodology  and  standards.  Evaluated,  working  exam¬ 
ples  of  approved  environments  will  be  available  for  inspection,  and  will  also  be  used  at  the  Center  in  the 
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evaluation  and  certification  of  software  submitted  to  the  NTF  by  the  elements.  The  Center  will  execute 
research  projects  at  the  direction  of  the  SDIO,  and  will  track  technology  advances  made  in  other  pro¬ 
grams  of  interest  for  the  SDS.  The  Center  will  also  provide  technical  support  for  the  development  of  a 
secure  repository  of  trusted  reusable  components  (mainly  in  Ada)  for  the  SDS. 

2.4.3  Development  of  Technical  Policy 

Proper  use  of  software  engineering  approaches  is  an  essential  prerequisite  for  the  development  of  reli¬ 
able  and  testable  software.  Accordingly,  the  SDIO  is  developing  a  Software  Policy  [SDI08^,SDI088b] 
which  specifies  those  practices  and  techniques  that  must  be  employed  in  the  development  of  mission- 
critical  SDS  software.  In  a  related  effort,  the  SDIO  C3  System  Operational  and  Integration  Function 
(SOIF)  is  sponsoring  the  development  of  guidelines  for  tailoring  DOD  Quality  Assurance  standard 
DOD-STD-2168  [DOD86a]  for  use  on  SDS  software  efforts. 

The  National  Computer  Security  Center  (NCSC)  is  developing  an  SDS  Security  Policy  which  will  pro¬ 
vide  guidance  on  the  use  of  formal  verification  technology  to  develop  secure  and  trusted  software.  The 
first  draft  of  this  policy  is  expected  before  the  end  of  1988. 

2.4.4  Planning  Activities 

Development  of  the  SDS  TEMP  began  in  early  1987.  The  most  recent  version  [SDI087]  was  com¬ 
pleted  in  June  1988.  While  the  evolving  software  annex  of  the  TEMP  essentially  conforms  to  DOD 
TEMP  guidelines  [DODD87a],  the  non-waterfall  SDS  development  Ufe  cycle  has  necessitated  some 
dive^ence.  For  example,  to  take  maximum  advantage  of  the  prototyping  approach,  the  software  annex 
requires  early  OT&E  of  an  evolving  series  of  prototypes  and  experimental  versions.  Early  OT&E  will 
revolve  around  examination  of  architecture  designs.  Subsequently,  DT&E  and  OT&E  will  be  performed 
on  all  major  prototypes  and  experimental  versions,  through  to  operational  SDS  software.  T&E  of  proto¬ 
types  and  experimental  versions  will  not  only  examine  the  technical  and  operational  properties  of  the 
products  under  test,  but  will  serve  as  the  basis  for  projecting  properties  of  the  deployed  system. 
Emphasis  vnll  be  placed  as  well  on  an  early  operational  assessment  of  the  process  of  T&E  in  conjunction 
with  the  objects  of  T&E. 
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3.  DYNAMIC  ANALYSIS  TECHNOLOGY 

This  section  provides  an  overview  of  the  state-of-the-art  in  dynamic  analysis  technology.  It  discusses 
the  major  techniques  and  ongoing  research  efforts  relating  to  the  testing  of  both  sequential  and  real- 
time/concurrent  software.  Since  these  techniques  require  executing  software  products  with  test  data, 
they  are  largely  used  for  testing  program  code.  In  those  cases  where  software  requirements  and  des^s 
are  executable,  however,  many  of  the  techniques  can  be  applied  to  pre-implementation  products. 
Although  the  burgeoning  development  of  large-scale  testing  and  evaluation  environments  is  discussed  in 
the  Section  4,  special  concerns  relating  to  oracles,  coverage  analyzers,  and  debuggers  are  covered  in  this 
section.  Finally,  the  critical  gaps  in  technology  are  elucidated. 

By  necessity,  the  following  material  has  been  kept  brief.  Sources  for  further  information  are  given  in  a 
supporting  bibliography  [YoimSSa].  This  bibliography  includes  an  extensive  index  to  guide  those  seeking 
sources  for  information  on  a  particular  topic. 

3.1  Techniques  for  Dynamic  Analysis  of  Sequential  Programs 

Of  the  areas  covered  by  this  report,  technology  for  the  dynamic  analysis  of  sequential  programs  is  the 
most  mature.  This  is  not  to  say  that  there  is  a  complete  technology  sufficient  for  all  practical  purposes. 
Indeed,  advances  in  fundamental  theory  can  be  expected  for  many  more  years.  Currently,  while  there  is  a 
small  body  of  techniques  that  can  be  applied  to  SDS  software  considerable  work  is  needed  to  turn  the 
few  existing  research  prototypes  into  industrial  strength,  practical  tools. 

Testing  techniques  can  be  categorized  as  either  white-box  or  black-box.  White-box  approaches  derive 
test  data  from  consideration  of  the  program  structure.  Path  selection  approaches  are  based  on  graph- 
theoretic  notions  of  control  flow  or  data  flow.  They  divide  the  input  space  of  a  program  into  domains 
which  cause  particular  control  or  data  paths  to  be  followed,  and  the  program  is  executed  on  test  cases 
that  are  constructed  by  selecting  test  data  from  these  domains.  Remaining  white-box  approaches  can  be 
further  distinguished  as  either  error-based  or  fault-based^.  Error-based  approaches  apply  the  whole 
realm  of  programming  knowledge,  such  as  information  about  error-prone  language  constructs,  to  the 
task  of  selecting  test  data  which  can  find  faults  in  the  execution  of  a  particular  program  path.  Fault-based 
approaches,  as  a  group,  are  the  most  recent  techniques  to  be  developed.  Here  test  data  is  designed  to 
demonstrate  the  absence  of  a  predetermined  set  of  faults  in  program  statements.  Black-box  approaches, 
also  called  functioruil  testing  approaches,  derive  test  data  from  the  functional  requirements  of  a  program, 
without  regard  to  the  internal  structure  of  the  program. 

The  following  subsections  identify  the  notable  dynamic  analysis  techniques  in  each  category.  As  varia¬ 
tions  on  several  of  the  described  techniques  exist,  these  techniques  should  be  regarded  as  a  representa¬ 
tive  subset  of  those  available,  not  an  exhaustive  listing. 

The  key  features  of  several  of  the  techniques  discussed  are  summarized  in  Table  3-1.  This  table 
identifies  the  types  of  faults  that  can  be  detected  and  whether  this  detection  is  guaranteed.  It  identifies  if 


2.  'Vnthin  the  field  there  is  a  lot  ot  coofiisioii  between  error-based  and  fault-based  testing.  This  is  partly  due  to  the  fact  that  before 
the  adoption  of  the  IEEE  terminology  which  gives  different  meanings  to  the  words  “error”  and  “fault,”  these  two  words  were 
used  interchangeably.  Consequently,  the  classifications  used  in  this  report  are  sometimes  historical  artifacts  and  the  distinctions 
between  the  identified  methods  is  not  always  so  clear. 
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automated  tools  are  available  to  support  application  of  the  techniques  to  Ada  programs,  or  programs 
written  in  some  other  language.  Under  the  heading  of  Inputs  to  the  Testing  Process,  it  indicates  whether 
common  inputs  such  as  program  specifications,  program  text,  and  test  data  are  required.  Path  analysis  of 
a  program  (usualfy  provided  by  symbolic  evaluation,  see  Section  4.1)  is  included  in  this  category  as 
another  relatively  co^on  input.  Additionally  required  inputs  are  identified  separately.  Similarly,  under 
Outputs  of  the  Tbsting  Process,  the  table  shows  which  techniques  provide  program  out|Mits,  which 
require  oracles  to  predict  the  correct  outputs  against  which  program  outputs  can  be  compared,  and 
which  support  locating  the  position  of  a  foult  in  the  program  text.  Again,  other  less  common  outputs  are 
described  separately.  Fin^,  the  table  indicates  those  techniques  that  are  suitable  for  practical  applica¬ 
tion  and  those  that  remain,  to  varying  degrees,  impractical  for  widespread  application  at  the  present 
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Table  3-1.  Key  Features  of  Dynamic  Analysis  Techniques  for  Sequential  Programs 
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3.1.1  Path  Selection  Techniques 

Path  selection  techniques  focusing  on  the  control  flow  through  a  program  were  the  first  systematic 
testing  strat^es  to  be  developed.  Here  the  control  structure  of  a  program  is  represented  as  a  finite, 
directed  graph  with  single  entry  and  single  exit  points  which  correspond  to  the  program  begmning  and 
end.  The  nodes  in  the  graph  represent  program  statements,  connected  by  edges  which  correspond  to 
possible  control  flows  between  statements. 

The  set  of  techniques  based  on  control  flow  graphs  are  collectively  known  as  structural  testing  tech¬ 
niques.  The  initial  three  categories  of  structural  testing  techniques  were:  statement  testing,  branch  test¬ 
ing,  and  path  testily.  Of  these,  statement  testing  requires  executing  a  program  with  test  data  that  cause 
each  program  statement  to  be  executed  at  least  once  and  is  the  weakest  approach.  At  the  other  extreme, 
testii^  of  all  program  paths  is  considered  the  ideal.  However,  since  the  totd  number  of  possible  program 
paths  in  a  typical  large  computer  program  is  in  the  range  of  10^  to  10^  (without  counting  the  number  of 
loop  traverses  within  each  path),  exhaustive  path  testing  is  generally  infeasible.  Branch  testing  requires 
executing  all  program  branches  at  least  once  and  is  commonly  agreed  to  be  the  minimum  acceptable  path 
selection  criterion. 

More  recently,  different  approaches  for  developing  intermediate  strategies  between  testing  all 
branches  and  all  paths  have  been  developed.  Woodward,  Hedley,  and  Kennel  [WoodSOa],  for  example, 
consider  Linear  Code  Sequence  and  Jump  (LCSAJ)  program  units.  These  are  sections  of  the  code 
through  which  the  flow  of  control  proceeds  sequentially  until  terminated  by  a  jump.  Program  execution 
paths  can  be  described  in  terms  of  concatenated  LCSAJs.  A  series  of  progressively  more  demanding 
coverage  measures  is  then  based  on  examining  the  proportion  of  distinct  subpaths  of  length  n  LCSAJs 
exercised  by  the  test  data. 

Another  set  of  techniques  intended  to  bridge  the  gap  between  branch  and  path  testing  are  based  on 
data  flow  analysis.  These  techniques  reflect  the  intuition  that  the  path  from  a  variable  assignment  to  its 
use  must  be  executed  to  provide  confidence  that  the  correct  value  was  assigned  to  that  variable.  Conse¬ 
quently,  data  flow  testing  techniques  select  test  data  forcing  the  execution  of  different  interactions 
l^tween  a  variable  definition  and  references  to  that  variable.  As  an  example  of  the  complexity  and  matu¬ 
rity  of  current  testing  techniques,  the  underlying  theory  of  data  flow  testing  is  summarized  in  Figure  3-1 
(taken  from  [Fran66]). 

Several  different  path  selection  criteria  based  on  data  flow  relationships  have  been  developed.  Clarke 
and  Richardson  have  formulated  a  uniform  model  for  three  families  of  data  flow  testing  criteria  (Rapps- 
Weyuker,  Ntafos’  required  k-tuples,  and  Laski-Korel)  and  defined  the  criteria  in  each  of  these  families  in 
terms  of  that  model.  They  an^yzed  the  path  coverage  of  each  criterion  and  developed  a  subsumption 
hierarchy  that  demonstrates  how  these  criteria  relate  to  each  other  [Clar8Sa,Clar86a].  This  analysis 
showed  that  the  most  comprehensive  of  the  criteria  in  each  family  are  incomparable  with  each  other  as  a 
consequence  of  the  different  foci  of  the  families.  More  recently,  a  new  family  of  criteria  [Fran86]  has 
been  developed  to  circumvent  the  problem,  common  to  all  path-directed  testing,  of  identifying  non¬ 
executable  paths. 

Weyuker  has  recently  completed  an  empirical  evaluation  of  the  complexity  of  data  flow  testing  tech¬ 
niques  [Weyu88].  This  study  focused  on  the  number  of  test  cases  needed  to  satisfy  different  criteria,  one 
of  the  cost  elements  in  applying  these  techniques.  While  theoretical  upper  bounds  on  the  number  of  test 
cases  needed  for  most  criteria  are  quadratic  or  exponential,  Weyuker  found  that,  in  practice,  far  fewer 
test  cases  are  necessary.  Figure  3-2  reproduces  the  information  presented  in  [Weyu84a]  and  [Weyu88]. 

In  a  series  of  papers  [Howd78a,Howd78b,Howd82a],  Howden  has  addressed  the  theoretical 
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The  subprogram  under  test  is  represented  by  a 
flow  graph.  Variable  occurrences  are  classified 
as  definUions,  undefinitions,  or  uses;  where  uses 
either  directly  affect  the  computation  or  reveal 
the  result  of  some  earlier  definition  (c-use),  or 
directly  affect  the  fiow  of  control  (p-use). 

A  c-use  of  a  variable  x  in  node  i  is  defined  to  be  a 
global  c-use  if  the  value  of  x  has  been  assigned  in 
some  block  other  than  block  i.  A  path 
(i^i,...^„,j),  m^,  containing  no  definitions  or 
undefinitions  of  x  in  nodes  ni,...,n„  is  called  a 
def-clear  path  wrt  x  from  node  i  to  node  j  and 
from  node  i  to  edge(n„,j).  A  node  i  has  a  ^obal 
definition  of  x  if  it  has  a  definition  of  x  and  there 
is  a  def-clear  path  wrt  x  from  node  i  to  some 
node  containing  a  global  c-use  or  edge  containing 
a  p-use  of  X.  The  subprogram’s  def-use  graph  is 
obtained  by  associating  with  each  node  i,  the  sets 
c-use(i)  and  def(i),  and  with  each  edge(i,j)  the  set 
p-use(i,j).  In  addition,  assumptions  include  that 
the  entry  node  has  a  definition  of  each  parameter 
and  each  global  variable  which  occurs  in  the  sub¬ 
program,  and  the  exit  node  has  an  undefinition  of 
each  local  variable  and  a  c-use  of  each  variable 
parameter. 
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A  def-c-use  association  is  a  triple  (i,j,x)  where  i 
is  a  node  containing  a  global  definition  of  x  and  j 
c  dcu(x,i).  A  def -p-use  association  is  a  triple 
(i,(j,k),x)  where  i  is  a  node  containk^  a  global 
definition  of  x  and  (j>k)  c  dpu(x,i).  A  simple  path 
is  one  in  which  all  nodes,  except  possibly  the  first 
and  last,  are  distinct.  A  loop-free  path  is  one  in 
which  all  nodes  are  distinct.  A  path 
(n  I  is  a  du-path  wrt  x  if  /i  i  has  a  global 

de&iition  of  x  and  either:  i)  n^  has  a  c-use  of  x 
and  (n  i,.../ij,nk)  is  a  def-clear  simple  path  wrt  x, 
or  ii)  (n,j/ik)  has  a  p-use  of  x  and  \ni,.../ij)  is  a 
def-clear  loop-free  path  wrt  x. 


Informally,  the  testing  criteria  require  that  test 
data  execute  def-clear  paths  from  each  node  con¬ 
taining  a  global  definition  of  a  variable  to 
specified  nodes  containing  global  c-uses  and 
^es  containing  p-uses  of  that  variable.  For 
each  variable  definition,  all  def-clear  paths  wrt 
that  variable  from  the  node  containii^  the 
definition  to  some  of  the  uses  reachable  by  some 
such  path  must  be  executed.  More  precisely: 
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The  following  relationship  holds: 
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Flgare  3*1.  Data  Flow  Testing  Theory 
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TlMorclical  Complexity 

Let  P  be  a  program  with  n  variables,  m  assignments,  i  input  statements,  and  t  conditional  statements. 
Let  a  test  case  consist  of  a  single  vector  of  input  variables. 

Thmi,  the  all-nodes  and  all-edges  criteria  require  at  most  r-f  1  test  cases.  All-defs  requires  at  most  m+i*n 
test  cases.  The  all-p-uses,  all-c-uses/some-p-uses,  all-p-uses/some-c-uses  and  all-uses  criteria  require  at 
most  1/4  (r^  +4t+3)  test  cases.  All-du-paths  requires  at  most  2*  test  cases. 

Empirical  Complexity 

The  automated  data  flow  testii^  tool  ASSET  was  applied  to  a  suite  of  programs  “Software  Tools  in 
Pascal”  by  Kemighan  and  Plauger  [KemSl].  Testers  were  instructed  to  select  atomic  test  cases  using  the 
strategy  of  their  choice.  The  data  flow  criteria  were  used  as  adequacy  measures.  For  each  program  the 
following  was  computed: 


number  of  teit  case*  sufficient  to  satisfy  each 
criteria  and  <f  is  the  number  of  decision 
statements 

2.  Weighted  average  of  the  ratios  of  d  to/ 

3.  Maximum  value  of  the  ratio  of  /  to  d 

4.  Weighted  average  of  the  ratio*  of  the 
theoretical  upper  bound  on  the  number  of  test 
cases  needed  to  satisfy  t 

5.  Weighted  average  of  the  ratio*  of  the  number 
of  test  cases  sufficient  to  satisfy  att-defi  to  / 

6.  Weighted  average  of  the  ratios  of  the  number 
of  test  cases  sufficient  to  satisfy  all-uses  to  the 
number  sufficient  to  satisfy  all-du-paths 


Although  the  theoretical  upper  bounds  on  the  number  of  test  cases  needed  to  satisfy  most  of  the  criteria 
are  quadratic  or  exponential,  in  practice  only  small  numbers  of  test  cases  (as  compared  to  the  program 
size)  were  needed. 

QuaUtatlTe  observations  from  empirical  study: 

—  All-p-uses  was  generally  harder  to  satisfy  than  all-c-uses,  and  a  test  which  satisfied  all-p-uses  usually 
satisfied  all-c-uses  too  (despite  the  fact  that  these  are  independent  criteria). 

—  Although  all-uses  is  more  demanding  than  all-p-uses,  a  test  set  which  was  adequate  when  assessed  by 
all-p-uses  was  generally  also  adequate  using  all-uses. 

—  Even  though  the  all-du-paths  criterion  has  an  exponential  upper  bound  whereas  the  all-uses  criterion 
has  a  quadratic  upper  bound,  in  practice  test  sets  sufficient  to  (almost)  satisfy  all-uses  were  frequently 
also  si^cient  to  (almost)  satisfy  all-du-paths. 
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Figarc  3*2.  Complexity  of  Data  Flow  Testing 
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effectiveaess  of  several  path  selection  and  test  data  selection  techniques.  Of  particular  interest  is 
Howden’s  evaluation  of  path  analysis  since  the  relid>ility  of  path  testing  places  an  upper  bound  on  the 
reliability  of  all  techniques  that  are  based  on  testing  of  a  subset  of  a  programs’  pat^.  The  conclusion 
^en  in  [Howd76c]  is:  “If  it  were  possible  to  test  every  path  in  a  program,  then  padi  testing  is  found  to  be 
reliable  or  almost  reliable  for  about  65%  of  the  [faults]  found  in  the  snudl  survey  of  11  [small,  sequential] 
programs  in  Kemighan  and  Plauger  [Kem74a].’’ 

In  addition  to  being  used  as  test  data  generation  strategies,  these  path  selection  techniques  can  be 
used  as  coverage  measures.  In  this  role,  the  code  is  instrumented  to  monitor  the  different  control  or  data 
elements  that  are  executed  in  the  course  of  testing.  The  adequacy  of  the  completed  testing  is  then  meas¬ 
ured  as  a  function  of  the  percentage  of  the  structural  units  executed. 

3.1.2  Error-Based  Techniques 

Much  recent  research  has  been  directed  at  the  notions  of  reliable  and  adequate  test  data  first  intro¬ 
duced  by  Goodenough  and  Gerhart  in  1975.  Their  theorem,  called  the  “fundamental  theorem  of  test¬ 
ing,”  characterized  the  properties  of  a  completely  effective  test  data  selection  strategy  based  on 
definitions  of  reliability,  validity,  and  completeness.  Essentially,  a  test  data  selection  strategy  is  said  to  be 
reliable  if  it  guarantees  to  generate  test  data  capable  of  detecting  every  fault  in  a  program  [Good75a]. 
This  work  was  the  first  formal,  systematic  approach  to  a  technology  which  had  previously  been  charac¬ 
terized  by  its  dependence  on  the  software  developer’s  intuition.  It  has  been  one  of  the  major  influences 
on  direction  and  scope  of  later  work. 

Although  Goodenough  and  Gerhart  provided  insight  into  how  to  develop  effective  program  tests,  they 
did  not  develop  an  actual  testing  approach.  Weyuker  and  Ostrand  modified  and  refined  Goodenough  and 
Gerhart’s  properties  for  ideal  tests  to  investigate  a  testing  strategy  that  combines  consideration  of  likely 
potential  faults  with  more  traditional  path  selection  and  functional  testing  approaches  [WeyuSOc].  The 
program’s  input  domain  is  partitioned  based  on  program-independent  and  structural  propertrc;  the 
program,  as  well  as  potentid  faults  that  have  been  identified  as  likely  for  the  problem  being  solved,  liis 
results  in  the  identification  of  revealing  subdomains.  The  key  property  of  a  revealing  subdomain  >  that 
the  existence  of  one  element  of  the  subdomain  which  leads  to  incorrect  processing  when  used  as  an  input 
implies  that  all  of  the  domain’s  elements  are  processed  incorrectly.  Equivalently,  if  any  input  is  processed 
correctly,  then  all  inputs  are  processed  correctly.  Selection  of  test  data,  therefore,  is  reduced  to  choosing 
an  arbitrary  element  from  each  subdomain.  This  is  sufficient  to  show  the  absence,  or  presence,  of  the 
particular  types  of  faults  being  considered.  Although  this  strategy  is  largely  a  research  vehicle  and  has 
not  been  developed  to  the  state  of  a  practical  testing  technique,  it  remains  interesting  since  it  demon¬ 
strates  the  goals  to  which  many  researchers  have  been  working. 

There  is  still  no  general  theory  that  states  whether  a  piece  of  test  data  protects  all  of  the  program  exe¬ 
cutions  along  a  particular  path  from  all  kinds  of  faults.  Even  so,  several  testing  techniques  have  been 
developed  that  achieve  test  data  adequacy  for  certain  limited,  well-defined  types  of  faults.  Indeed,  some 
of  these  techniques  reliably  demonstrate  the  absence  of  prespecified  types  of  faults. 

A  commonly  used  classification  is  to  distinguish  program  faults  as  either  computation,  domain,  or 
missing  path  faults.  (This  classification  was  first  introduced  and  analyzed  by  Howden  in  [Howd76c].) 
Computation  faults  result  from  incorrect  operations  performed  along  a  correct  execution  patn,  such  as 
missing  or  inappropriate  assignment  statements.  Domain  faults  result  from  incorrect  path  traversals  that 
occur  due  to  path  selection  faults.  Finally,  missing  path  faults  occur  when  some  special  case  requires  a 
unique  sequence  of  actions,  but  the  program  does  not  contain  a  path  whose  execution  will  cause  that 
sequence  of  actions.  Following  this  distinction,  Clarke  and  Richardson  have  developed  a  strategy  for 


22 

UNCLASSIFIED 


UNCLASSIFIED 


selecting  test  data  sensitive  to  particular  computation  faults  based  on  analyzing  a  symbolic  representation 
of  the  path  computation  [Clar83b]. 

Howden  and  Zeil  have  proposed  additional  computation  testing  approaches  based  on  the  use  of  alge¬ 
braic  techniques  for  defining  a  neighborhood  of  functions.  Howden’s  algebraic  testing  [HowdTSb]  estab¬ 
lishes  rules  for  choosing  data  to  differentiate  among  all  members  of  a  functional  class,  and  then  applies 
those  rules  to  any  program  whose  output  is  expected  to  fall  within  that  class.  Zeil’s  perturbation  testing 
[Zeil83a],  developed  for  testu^  numeric  code,  involves  the  derivation  of  those  members  of  a  chosen 
functional  class  which  are  indistinguishable  from  the  program  function  using  all  the  test  data  so  far 
applied.  The  key  idea  is  to  add  a  perturbing  function  to  expressions  occurring  in  the  software  and  derive 
the  conditions  under  which  that  fault  could  go  undetected  by  a  given  test  path.  In  this  way,  perturbation 
testing  is  really  a  path  selection  adequacy  method,  though  it  can  also  be  used  to  generate  test  data  to 
reveal  faults  in  arithmetic  expressions. 

In  a  system  called  EQUATE,  Zeil  merges  perturbation  testing  and  mutation  testing  (see  Section  3.1.3) 
to  provide  another  technique  capable  of  finding  faults  in  the  execution  of  a  program  path.  Primary  goals 
are  to  remove  the  limitation  of  perturbation  testing  to  numeric  domains  and  the  limitation  of  mutation 
testing  to  detection  of  simple  faults.  An  additional  goal  is  to  overcome  the  decrease  in  effectiveness 
suffered  by  both  methods  for  programs  employing  high-levels  of  data  and  functional  abstraction. 
EQUATE  selects  a  number  of  test  locations  throughout  the  program  and  chooses  a  set  of  expressions 
derived  from  the  abstract  syntax  tree  of  the  module  being  tested.  Test  data  is  required  that  distinguishes 
each  pair  of  these  expressions  from  one  another  at  every  test  location  [Zeil86].  Zeil  describes  a  set  of 
designated  expressions  and  constants,  called  terms,  that  are  formed  from  the  union  of  the  following  three 
subsets: 


1.  The  set  of  all  expressions  and  subexpressions  from  the  abstract  syntax  tree  of 
the  module  under  test,  called  the  expression  set  of  the  module. 

2.  The  set  of  values  first  taken  on  by  each  expression  set  term  at  each  test  loca¬ 
tion  during  testing,  called  the  initial  value  set. 

3.  The  set  of  expressions  that  can  be  formed  by  substituting  any  member  of  the 
expression  set  for  any  subexpression  of  another  expression  set  member,  called 
operand  substitution  terms. 

Test  locations  occur  at  the  beginning  of  each  basic  block  and  immediately  following  each  statement  in 
the  block. 

Cohen  and  White  have  developed  a  technique  called  domain  testing  that  guarantees,  within  a  given 
error  bound,  to  detect  path  selection  faults.  This  technique  guides  the  selection  of  test  data  by  geometric 
analysis  of  path  domain  boundaries  (where  the  boundary  of  a  path  domain  is  determined  by  the  condi¬ 
tional  branches  that  are  taken  along  the  path).  It  generates  test  points  on  and  near  each  boundary  that 
can  detect  whether  a  domain  error  has  occurred.  If  so,  one  or  more  of  the  boundaries  will  have  shifted, 
or  the  corresponding  predicate  relational  operator  will  have  changed.  Otherwise,  if  the  program  yields 
correct  results  for  the  chosen  test  data,  the  path  domain  boundary  is  correct  within  the  error  bound. 
Cohen  and  White  proposed  the  first  domain  test  data  selection  strategies  and  error  bound  criteria 
[Whit78b],  and  later  strategies  which  reduce  the  displaced  domain  associated  with  an  undetectable 
border  and  offer  lower  complexity  [Whit86].  One  of  the  serious  limitations  of  domain  testing  is  the  need 
to  examine  the  potentially  infinite  number  of  domains  that  arise  from  iterated  loops.  Recent  work  has 
focused  on  reducing  this  burden  [WhitSSa]. 

Partition  analysis  testing  [RichSSa]  integrates  several  testing  strategies,  such  as  domain,  computation. 
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and  a^ebraic  testing  and  a  fom  of  formal  verification.  It  compares  a  program  to  its  formal  specification 
and  so  is  one  of  the  few  testing  approaches  that  can  detect  missing  path  faults.  The  overall  strategy 
requires  partitioning  the  input  domain  into  procedure  subdomains  so  that  the  elements  of  each  sut^ 
domain  are  treated  unifonnty  by  the  specification  and  processed  uniformly  by  the  implementation.  Parti' 
tion  verification,  a  variation  on  symbolic  testing  (see  Section  4.1),  is  performed  to  demonstrate  the  con¬ 
sistency  between  the  specification  and  its  implementation.  This  verification  is  enhanced  by  partition  test¬ 
ing.  Partition  testing  uses  information  related  to  each  subdomain  to  guide  the  selection  of  test  data 
which,  in  «cecuting  the  program,  helps  to  determine  whether  the  program  conforms  to  its  specification. 

3.1.3  Fault-Based  Techniques 

Fault-based  techniques  differ  from  error-based  techniques  in  that  they  examine  program  statements  to 
demonstrate  the  absence  of  a  predefined  set  of  faults. 

Morell  [MoreSS]  describes  fault-based  testing  as  a  three  stage  process  operating  in  an  arena  consisting 
of  (1)  a  specification,  (2)  a  program,  and  (3)  the  domain  of  interest  which  is  the  source  of  test  data, 
together  with  a  prescribed  list  of  potential  faults,  called  alternatives.  The  first  stage  requires  identifying 
the  locations  in  the  program  where  the  alternatives  might  lie.  Then  a  test  set  is  developed  which  executes 
these  locations,  yet  yields  correct  output.  Finally,  information  collected  during  the  execution  is  used  to 
deduce  that  no  alternative  could  have  been  inserted  into  the  program  without  being  detected  by  the  test. 
He  goes  on  to  provide  a  model  of  fault-based  testing  which  can  be  used  to  investigate  the  theoretical  limi¬ 
tations  of  this  group  of  techniques.  Two  orthogonal  attributes  are  used  to  categorize  fault-based  testing 
techniques.  The  breadth  of  a  technique  is  given  by  number  of  potential  faults  considered,  it  may  be  finite 
or  infinite.  Whereas  the  extent  relates  to  the  information  used  to  determine  the  absence  of  faults  and  may 
be  local  or  global. 

While  all  fault-based  approaches  support  test  data  generation,  they  are  primarily  test  data  adequacy 
measurement  techniques.  One  of  the  earliest  and  perhaps  the  best  known  is  mutation  testing,  also  called 
mutation  analysis  [BuddSOa].  Mutation  testing  requires  the  definition  of  a  set  of  mutation  transforma¬ 
tions,  called  error  operators,  which  are  applied  singly  to  the  elementary  components  of  a  program  to 
introduce  certain  types  of  simple  errors  as  faults.  Test  data  that  can  distinguish  between  the  original  and 
mutated  programs  is  then  deemed  adequate  for  detection  of  that  particular  type  of  error.  The  ability  of 
these  simple  errors  to  cover  more  complex  errors  is  derived  from  the  “Coupling  Effect.”  The  restriction 
of  introducing  single  errors  is  justified  by  an  empirical  principle  called  the  “Competent  Programmer 
Hypothesis.”  These  two  assumptions  are  defined  as: 

Competent  Programmer  Hypothesis  —  The  assumption  that  the  program  to  be  tested  has  been 
written  by  a  competent  programmer  [DeMi88a]. 

Coupling  Effect  —  Test  data  that  distinguishes  all  programs  differing  from  a  correct  one  by  only 
simple  errors  is  so  sensitive  that  it  also  implicitly  distinguishes  more  complex  errors  [DeMi78]. 

Error  operators  exist  to  demonstrate  the  absence  of  common  Fortran  errors,  and  a  set  of  error  operators 
appropriate  to  Ada  programs  is  under  development  [Appe88].  Since  the  set  of  error  operators  is  limited 
and  mutants  are  distinguished  from  the  original  program  based  on  the  program  output,  mutation  testing 
has  finite  breadth  and  global  extent.  While  an  effective  technique,  it  is  expensive.  The  need  to  apply  error 
operators  individually  can  lead  to  the  generation  of  hundreds  of  mutated  programs,  each  of  which  must  be 
separately  compiled  and  executed. 

There  are  several  variations  of  mutation  testing  which  attempt  to  overcome  some  of  the  weaknesses  of 
the  tec  lique  in  its  original  form.  Whereas  the  original  strategy  (strong  mutation  testing)  applies  to  a 
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program  as  a  whole,  weak  mutation  testing  (Howd82a,Howd87]  applies  to  program  components,  usually 
elementary  computational  structures.  Since  it  is  unnecessary  to  p^orm  a  separate  compilation  and  exe¬ 
cution  for  each  mutation,  weak  mutation  testing  is  cheaper  to  apply,  althot^  it  cannot  guarantee  detec¬ 
tion  of  faults  in  the  function  computed  by  a  pro^am. 

Weak  mutation  testii^  was  partially  derived  from  Error  Sensitive  Test  Case  Analysis  (ESTCA) 
[FostSO].  This  approach  attempts  to  detect  all  common  types  of  faults.  As  such,  it  was  the  earliest  exam¬ 
ple  of  an  infinite  breadth  fault-based  technique.  Adapted  from  a  hardware  failure  analysis  technique, 
ESTCA  was  developed  by  simulating  frequently  occurring  code  faults,  identifying  the  most  effective  test 
patterns  for  detecting  these  faults,  and  then  deducing  rules  for  an  algorithm  to  generate  fault  sensitive  test 
data.  The  test  data  and  expected  outputs  are  derived  from  examination  of  a  program’s  specification.  The 
program  is  then  executed  on  the  test  data  and  the  actual  results  compared  against  the  expected  results  to 
identify  failures. 

Ti'ace  mutation  testing  [Howd82a]  uses  program  traces  to  compare  the  results  of  a  program  and  its 
mutations  rather  than  output  values,  thus  allowing  several  mutation  transformations  to  applied  con¬ 
currently.  The  most  recent  variation,  firm  mutation  testing  [WoodSS],  claims  to  combine  the  best  elements 
of  strong  and  weak  mutation  testing.  It  uses  components  with  more  extensive  scope  than  weak  mutation 
testing,  and  permits  partial  execution  of  components  so  that  many  mutants  can  be  applied  in  a  single  exe¬ 
cution. 

A  form  of  mutation  testing  that  allows  validating  a  program  against  its  specification  has  also  been 
developed  [BuddSS].  Specifications  are  ^ven  in  the  predicate  calculus,  modified  to  clearly  indicate  the 
input-output  relationships  of  a  program.  These  specifications  are  mutated  by  adding  logical  clauses  to  the 
input  and  output  conditions.  Contrary  to  other  forms  of  mutation  testing,  the  goal  here  is  to  generate  test 
cases  which  satisfy  the  additional  constraints  so  that  the  original  and  mutated  specifications  produce  the 
same  result.  First  the  program  and  the  original  specification  are  executed  on  the  test  cases.  If  the  results 
from  these  executions  indicate  consistency  between  the  specification  and  its  implementation,  the  mutated 
specifications  are  executed  with  the  same  test  data.  A  mutation  is  eliminated  when  an  input  clause  is 
satisfied,  but  one  or  more  output  clauses  are  unsatisfied.  Unlike  the  forms  of  mutation  testing  which  only 
consider  a  program  implementation,  this  method  can  identify  missing  path  faults  in  a  program. 

Although  most  error-  and  fault-based  approaches  are  designed  to  activate  faults,  few  ensure  that  the 
effects  of  faults  are  propagated  through  the  program  to  be  revealed  as  failures  in  the  output.  In  order  to 
overcome  this  problem,  much  current  work  is  investigating  models  for  fault  propagation,  yielding  fault- 
based  techniques  with  both  infinite  breadth  and  global  extent. 

MoreU’s  approach  utilizes  symbolic  execution  for  a  dynamic  model  of  fault  activation  and  propagation. 
This  is  applied  to  mutation  testing  to  overcome  the  limitation  requiring  individual  generation  and  execu¬ 
tion  of  mutants.  The  fundamental  concept  underlying  symbolic  execution  is  the  ability  to  model  infinitely 
many  executions  of  a  program  with  test  data  by  a  single  symbolic  execution.  Morell  modifies  this  concept 
to  executing  infinitely  many  mutation  alternatives  in  an  execution.  To  use  Morell’s  words:  “For  symbolic 
execution  the  key  lies  in  encoding  infinitely  many  inputs  by  a  single  symbolic  input.  For  symbolic  testing 
the  key  lies  in  encoding  infinitely  many  alternatives  in  a  single  symbolic  alternative”  [MoreSfi].  The 
modified  form  of  execution  proceeds  as  usual  until  the  expression  containing  the  symbolic  alternative  is 
reached.  Then,  instead  of  creating  a  symbolic  value  for  an  input,  a  symbolic  value  simulating  the  result  of 
the  mutation  transformations  represented  by  the  symbolic  alternative  is  generated.  This  value  is  pro¬ 
pagated  through  the  program  until  it  appears  as  an  output  embodying  all  the  possible  impacts  of  the  alter¬ 
natives.  This  intuitively  appealing  approach  has  some  drawbacks  relating  to  undecidability  problems  in 
program  control  flow.  Morell  is  studying  alternative  definitions  of  a  program  path  to  resolve  these  prob¬ 
lems. 
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I  RELAY  [Ricli86a,Rich88]  uses  static  analysis  of  the  program  to  analyze  fault  activation  and  propaga¬ 

tion.  It  defines  revealing  conations  which  guarantee  that  a  possible  fault  is  activated  during  execution  and 
that  the  fault  effect  transfers  through  computations  and  data  flow  until  it  is  revealed.  RELAY  is  applied  by 
choosii^  a  f^t  classification  and  determining  the  origmation  conditions  and  transfer  conditions  for  each 
class  of  faults.  These  conditions  are  then  evaluated  for  the  program  beii^  tested  to  provide  the  revealing 
conditions.  Thus  test  data  selection  is  treated  in  conjunction  with  path  selection  by  selecting  test  data  that 
not  onfy  ordinates  a  fault  but  also  transfers  that  foult  to  affect  the  output,  guaranteeing  the  detection  of 
I  any  fault  of  the  chosen  classes. 

Although  RELAY  can  be  used  for  both  test  data  selection  and  test  data  adequacy  measurement,  it  was 
primarily  developed  to  serve  a  different  goal.  Researchers  at  the  University  of  Massachusetts  are  under¬ 
taking  a  long-term  effort  to  evaluate  test  data  selection  techniques.  One  of  the  difficulties  they  have 
encountered  is  that  most  existing  techniques  use  different  underlying  models,  making  comparison 
between  techniques  difficult.  Therefore,  RELAY  was  developed  as  a  consistent  model  for  software  test¬ 
is^  which  is  expressive  enough  that  most  current  testing  techffiques  can  be  mapped  on  to  it,  although  it  is 
particularly  suited  to  those  that  are  fault-based.  RELAY  has  been  used  to  analyze  three  fault-based  cri¬ 
teria:  Budd’s  Estimate,  Weak  Mutation  Testing,  and  ESTCA.  This  analysis  demonstrates  that  none  of 
these  criteria  guarantees  the  detection  of  faults.  Richardson  and  Thompson  discuss  two  common 
weaknesses:  “First,  the  criteria  do  not  thoroughly  consider  the  potential  unsadsfiability  of  their  rules; 
each  criterion  includes  rules  that  are  sufficient  to  reveal  [faults]  for  some  fault  classes,  yet  when  such  rules 
are  unsatisfiable,  many  [faults]  may  remain  undetected.  Second,  the  criteria  fail  to  integrate  their  rules; 
although  a  criterion  may  cause  an  expression  to  take  on  an  erroneous  value,  there  is  no  effort  made  to 
guarantee  that  the  enclosing  expressions  evaluate  incorrectly**  [Rich86a].  These  weaknesses  are  by  no 
means  exclusive  to  the  aforementioned  techniques.  As  yet  no  effective  rules  for  showing  how  to  transfer 
faults  out  of  loops  and  conditionals  have  been  developed. 

Richardson  and  Thompson,  the  developers  of  RELAY,  are  currently  investigatii^  the  use  of  the 
RELAY  model  for  integration  and  specification-based  testing.  They  are  also  enhancing  the  capabilities  of 
RELAY  with  regard  to  data  flow  transfer,  in  order  to  facilitate  processing  of  loops  and  modeling  the 
transfer  of  multiple  faults.  Other  researchers,  see  [LongSS],  are  attempting  to  extend  the  RELAY  model 
for  application  in  fault-based  testing  of  concurrent,  real-time  software. 

There  are  three  necessary  and  sufficient  conditions  for  fault-based  testing  to  ensure  that  a  program  is 
correct  with  respect  to  its  specification  [MoreSS] : 


1.  The  fault-based  arena  must  be  alternate-sufficient.  That  is,  either  the  original 
program  or  one  of  the  alternate  programs  must  be  correct.  Since  there  is  no 
algorithm  to  determine  alternate-sufficiency,  the  fault-based  arena  must  be 
assumed  alternate-sufficient  until  proven  otherwise. 

2.  Coupling  does  not  occur  for  the  test  set.  That  is,  each  alternate  can  be  indepen¬ 
dently  detected  by  a  test  set,  along  with  combinations  of  alternatives  that  apply. 

Again,  it  has  been  shown  in  [More84]  that  no  algorithm  for  determining  cou¬ 
pling  exists. 

3.  Coincidental  correctness,  where  a  fault  on  an  executed  path  does  not  produce 
erroneous  results,  does  not  occur. 

These  conditions  are  undecidable.  Therefore  fault-based  testing  cannot  guarantee  that  a  program  is 
correct.  It  does,  however,  offer  valuable  information  on  the  absence  of  certain  faults.  Since  software  relia¬ 
bility  is  directly  related  to  the  presence  or  absence  of  faults,  this  information  could  be  exploited  in 
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software  reliability  measurement.  Morell  is  invest^tmg  the  use  of  his  model  of  fault-based  testing  as  the 
basis  for  a  white4>Qz  model  of  software  reliability. 

3.1.4  Functional  Tedudques 

The  majority  of  the  previously  discussed  white-box  testing  approaches  consider  only  the  program 
implementation.  By  ignoring  its  specification,  these  methods  are  limited  to  testii^  what  the  program  does, 
raUier  than  what  it  is  intended  to  do.  Functional  testing  techniques  do  not  consider  the  internal  structure 
of  programs.  Instead,  generation  of  test  data  (and  apected  results)  relies  on  the  specification  of  program 
function,  inputs,  and  outputs.  In  addition  to  revealing  faults  in  programs,  these  black-box  techniques  usu¬ 
ally  have  the  desirable  side-efiect  of  detecting  ambiguities  and  incompleteness  in  program  specifications. 

Howden  has  recently  shown  how  different  testing  and  analysis  methods  relate  to  an  expanded  interpre¬ 
tation  of  functional  testing,  in  which  a  system  is  viewed  as  a  structure  of  related  functions.  This  approach 
can  also  be  categorized  as  an  error-based  ^proach  since  it  looks  directly  at  how  programmers  make 
errors,  as  opposed  to  the  possible  fault  effects  of  those  errors.  He  provides  a  model  of  how  software  is 
constructed  and  of  the  reasoning  errors  that  humans  make  during  this  process.  From  the  premise  that 
software  is  developed  by  synthesizing  functions,  Howden  discusses  two  views  of  programs.  Tliese  are:  (1) 
hierarchical  top-down  fractional  structures,  and  (2)  horizontal  state  transition  diagrams.  Completeness  of 
testing  is  then  achieved  by  identifying  the  fault  effects  of  possible  errors  and  constructing  methods  for 
detecting  these  faults.  This  work  is  reported  in  his  book.  Functional  Program  Testing  and  Analysis 
[HowdST].  Since  completing  the  book,  Howden  has  investigated  a  more  general  error  model.  This  new 
model  takes  the  view  that  abstraction  and  decomposition  are  the  major  tools  for  reasoning  about  complex 
objects,  and  different  testing  and  evaluation  techidques  are  suitable  for  the  different  kinds  of  errors  which 
may  be  made  during  these  processes.  Although  not  yet  completed,  this  later  work  relates  the  different 
forms  of  dynamic  and  static  analysis  and  addresses  all  software  products,  not  just  code. 

The  earliest  fractional  strategies  were  equivalence  partitioning  [Myer79],  boundary-value  analysis 
[Myer79],  and  cause-effect  graphii^  [Elme73].  Recognizing  that  exhaustive  testing  is  rarely  feasible,  these 
three  strategies  address  the  problem  of  trying  to  select  the  subset  of  all  possible  inputs  that  have  the 
highest  probability  of  finding  the  most  faults. 

In  equivalence  partitioning  the  input  domain  of  a  program  is  partitioned  into  a  finite  number  of 
equivalence  classes  where,  it  is  assumed,  a  test  of  a  representative  value  of  each  class  is  equivalent  to  a 
test  of  any  other  value.  This  implies  that  if  one  test  case  in  an  equivalence  class  detects  a  fault,  all  other 
test  cases  would  be  expected  to  find  the  same  fault.  The  minimal  set  of  test  cases  covering  all  equivalence 
classes  is  then  developed  by  selecting  test  cases  that  invoke  as  many  different  input  conditions  as  possible. 
Boundary  value  analysis  differs  from  equivalence  partitioning  by  selecting  values  in  both  input  and  output 
equivalence  classes  that  test  the  edges  of  each  class.  These  values  are  often  called  extremal/special  values, 
where  extremal  values  are  those  that  lie  on  the  edges  of  variable  domains  and  special  values  are  those  with 
special  mathematical  properties.  Since  few  other  testing  techniques  provide  good  coverage  of 
extremal/special  values,  this  approach  is  frequently  included  in  other  techniques.  Cause-effect  graphing 
offers  some  advantages  over  the  other  two  techniques  by  exploring  combinations  of  input  conditions. 
Here,  a  program’s  output  domain  is  partitioned  into  effect  classes,  which  also  results  in  partitioning  the 
input  domain  into  causes  that  correspond  to  particular  effects.  The  different  classes,  and  links  between 
them,  are  expressed  in  a  combinatorial  logic  network  called  a  cause-effect  graph.  The  dependencies  thus 
revealed  are  used  to  derive  appropriate  test  cases. 

Another  technique  in  this  group  is  random  testing.  This  technique  is  based  on  the  idea  of  testing  a  pro¬ 
gram  by  sampling  for  faults,  and  a  measure  of  the  program’s  correctness  is  given  by  the  proportion  of 
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dements  in  the  input  domain  for  which  it  fails  to  execute  correctly.  It  not  only  can  uncover  faults  but  pro¬ 
vides  a  rdiability  measure  for  the  program.  In  this  latter  hmction,  the  numl^  of  failures  in  a  set  of  test 
cases  is  related  to  the  correcmess  measure  via  a  probability  distribution  function  that  depends  on  the  way 
test  data  is  chosen.  Tbst  cases  are  randomly  generated  from  either  a  uniform  distribution  of  a  program’s 
input  domain  or  from  its  operational  profile.  Once  the  input  domain  or  operational  profile  is  determined, 
generation  of  random  test  data  is  usually  much  easier  and  cheiq)er  than  other  test  data  generation  stra¬ 
tegies.  Poor  sdection  of  the  range  can,  however,  lead  to  wasteful  generation  and  execution  of  test  data, 
and  inconclusive  results.  Also  accurate  identification  of  the  input  domain  or  operational  profile  depends 
on  the  ability  to  predict  the  actual  operating  environment.  This  is  not  always  possible. 

Statistical  testing  is  a  variation  on  random  testing.  In  this  case,  the  emphasis  is  on  certifying  the  opera¬ 
tional  effectiveness  of  the  software,  with  an  estimate  of  product  reliability  being  given  in  terms  of  the 
mean  time  to  failure  (MTTF)  [Dyer^a].  Fault  detection  plays  a  secondary  role.  Test  data  is  selected  based 
on  the  anticipated  statistical  distribution  of  the  operation^  data  to  provide  a  realistic  simulation  of  the 
product  environment.  Randomized  sampling  tech^ques  are  used  to  introduce  efficiencies  in  the  size  and 
content  of  the  samples.  This  approach  provides  a  technical  basis  for  making  statistical  inferences  about 
operational  effectiveness  based  on  test  results.  These  results  can  be  extrapolated  to  provide  operational 
reliability  estimates. 

While  the  value  of  random  testing  for  detecting  faults  with  a  low  probability  of  occurrence  is  uncertain, 
recent,  small-scale  empirical  experiments  suggest  that  it  potentially  has  a  valuable  role  to  play  as  one  ele¬ 
ment  in  an  integrated  testing  strategy.  For  example,  Duran  and  Ntafos  have  conducted  several  experi¬ 
ments  which  indicate  that  random  testing  gives  hi^  levels  of  structural  and  fault-based  coverage  adequacy 
measures.  In  the  first  case,  these  researchers  report:  "...  of  the  5  programs  tested,  a  moderate  numl^r  of 
random  test  cases  on  the  average  achieved  97%  of  segment  testing,  93%  of  branch  testing,  57%  of  struc¬ 
tured  path  testing,  72%  of  required  pairs  testing  [NtafSla],  81%  of  [LCSAJ  measure]  TER3,  and  74%  of 
[LCSAJ  measure]  TER4”  [Dura84].  In  the  second  case:  "Seven  programs  were  tested,  using  the  mutation 
system  at  Georgia  Tech,  with  from  8  to  20  random  test  cases.  79%  of  the  mutants  were  eliminated  by  the 
test  cases  as  compared  with  84%  for  branch  testing  and  90%  for  required  pairs  testing  (the  number  of 
remaining  mutants  includes  those  that  are  equivalent  to  the  original  program”  [NtafSla].  Additional 
experimentation  to  substantiate  these  findings  is  needed. 

The  advent  of  formal  specification  languages  has  given  rise  to  new  types  of  functional  testing  where  a 
program  specification  is  actively  used  to  support  the  testing  process,  usu^y  through  automatic  generation 
of  implementation-independent  test  data.  Although  many  examples  exist  (see  [Gogu79a,Muss79,Gorl87]), 
this  subsection  focuses  on  three  techniques  which  illustrate  the  ongoing  work  in  this  area. 

Grammar-based  testing  techniques  rely  on  the  use  of  test  grammars  to  generate  program  inputs  and 
expected  outputs.  The  test  grammar  is  developed  from  a  formal  specification  of  the  program  and  provides 
a  separate  partial  specification  that  can  be  compared  against  the  original  specification  to  uncover  ambigui¬ 
ties.  During  implementation,  the  partial  specification  and  the  implementation  are  executed  on  identical 
test  data  and  the  results  compared  to  identify  faults.  One  particular  strategy,  using  attribute  context-free 
grammars  [Dunc81],  allows  random  generation  of  test  cases  or  the  application  of  testing  heuristics  (for 
example,  boundary  value  analysis,  or  ESTCA  testing)  for  more  systematic  test  case  generation. 

The  other  two  approaches  use  algebraic  specifications.  An  algebraic  specification  is  made  up  of  a  list  of 
functions  on  a  set  of  sorts  (types)  and  a  set  of  axioms  defining  properties  of  the  defined  functions.  The  first 
of  these  approaches  uses  Hebraic  axioms  of  abstract  data  types  to  aid  the  testing  process.  The  second 
uses  logic  programming  to  generate  test  data  sets  from  algebraic  data  type  specifications. 

The  Data-Abstraction  Implementation,  Specification,  and  Testing  System  (DAISTS)  [Gann81] 
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provides  a  language  for  formal  expression  of  the  semantics  of  data  abstractions,  independent  of  their 
implementation.  This  allows  a  program  implementing  abstract  data  types  to  be  annotated  with  nonpro¬ 
cedural,  algebraic  axioms  and  test  cases  to  facilitate  mechanical  consistency  checks  between  the  axioms 
and  the  implementation.  Although  the  same  software  developer  may  write  ^th  axioms  and  implementa¬ 
tion,  the  <^ogonal  nature  of  these  two  representation  forms  reduces  the  likelihood  of  the  same  faults 
occurring  in  each.  DAISTS  is  a  compiler-based  system.  The  compiler  prepares  a  “program”  which  util¬ 
izes  the  axioms  as  a  test  driver  for  the  implementation.  The  software  developer  provides  test  points  in  the 
form  of  expressions  using  the  abstract  functions.  These  are  fed  to  the  program  which  then  cycles  through 
the  test  data,  monitoring  the  execution  of  both  axioms  and  implementation  to  determine  if  they  a^ee. 
There  are  two  significant  benefits  to  this  approach.  Hrst,  the  software  developer  only  supplies  test  inputs, 
an  oracle  is  not  needed.  Second,  while  the  develop^'  provides  the  implementation,  axioms,  and  test 
points,  DAISTS  automates  the  testii^  by  developing  the  necessary  test  drivers. 

Choquet  and  colleagues  have  developed  an  alternative  approach  for  generating  test  data  from  algebraic 
data  type  specifications.  This  approach  arose  from  the  noted  similarity  between  an  algebraic  specification 
and  a  logic  program  (both  of  which  describe  logical  properties),  and  exploits  logic  programming  for  test 
data  generation.  It  is  based  on  a  formalized  theory  of  testing.  The  basic  assmnption  for  test  construction  is 
the  Correlation  Principle  which  states:  “There  exists  a  narrow  correlation  between  specification  structure 
and  implementation  structure”  [Boug86].  It  requires  hypotheses  relating  to  the  concepts  of  higher-  and 
lower-level  sorts  occurring  in  hierarchical  specifications.  In  [Choq86],  these  hypotheses  are  given  as: 


1.  The  regularity  hypothesis  for  a  level  n  states  that  if  the  test  is  successful  for  data 
of  complexity  less  than  n  [where  a  level  of  complexity  is  associated  with  each 
member  of  an  input  subdomain],  then  the  program  behaves  correctly  for  any 
value. 

2.  The  uniformity  hypothesis  states  that  if  the  test  is  successful  for  one  datum  in  a 
subdomain  then  the  program  behaves  correctly  for  any  data  in  this  subdomain. 

It  is  also  required  for  specifications  to  be  translated  or  considered  as  logic  programs  and  the  search  stra¬ 
tegy  to  be  controlled  by  a  logic  interpreter. 

As  with  DAISTS,  testing  consists  of  verifying  that  an  implementation  satisfies  each  axiom  of  the 
specification.  In  Choquet’s  approach,  however,  the  specification  of  a  function  is  used  to  produce  input 
data  for  that  function  and  predict  the  expected  result  of  the  function.  Test  data  sets  are  generated  for  each 
axiom  and  collected  together  to  form  one  test  data  set  for  the  whole  specification.  Early  test  data  genera¬ 
tion  algorithms  produced  extensive  data  sets  due  to  problems  in  applying  the  hypotheses.  These  have  been 
replaced  by  a  more  efficient  procedure  utilizing  constraints  on  variables  to  delimit  uniformity  subdomains 
[Choq86].  This  improves  the  implementation  of  the  uniformity  hypothesis  and  so  simplifies  the  test  data 
generated.  A  prototyp>e  tool  implemented  in  Quintus  Prolog  is  under  testing  at  the  Universite  de  Paris- 
Sud. 


3.2  Techniques  for  Dynamic  Analysis  of  Concurrent  and  Real-Time  Programs 

As  previously  stated,  testing  of  concurrent  programs  invariably  starts  with  testing  each  task  as  an 
independent  unit  using  sequential  program  testing  techniques.  The  tasks  are  then  tested  jointly,  primarily 
to  examine  their  synchronization  behavior.  There  are,  as  yet,  few  disciplined  techniques  for  this  latter 
form  of  testing.  Moreover,  since  synchronization  patterns  are  partially  determined  by  a  scheduler  at  run¬ 
time,  and  are  thus  sensitive  to  timing,  dynamic  techniques  may  not  be  able  to  detect  all  existing  faults. 
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Researchers  at  the  University  of  California  (Irvine)  have  developed  algorithms  for  applying  structural 
testing,  based  on  control  and  data  flow,  to  concurrent  programs  [ThylSSa].  In  addition  to  detecting  data 
flow  anomalies,  these  algorithms  can  detect  synchronization  faults  such  as  waiting  for  an  unscheduled 
process,  waiting  for  a  process  guaranteed  to  have  already  terminated,  and  scheduling  a  process  in  parallel 
with  itself.  As  with  sequential  programs,  structural  test^  of  concurrent  programs  can  be  used  as  both  a 
test  data  generation  strategy  and  to  measure  the  adequacy  of  completed  testing.  In  this  latter  role,  data  on 
concurrency  state  coverage,  state  transition  coverage,  and  synchronization  coverage  for  concurrent  Ada 
programs  is  collected  and  correlated  over  a  set  of  test  runs.  Information  on  concurrency  states  and  con¬ 
currency  histories  is  used,  in  conjunction  with  symbolic  evaluation  techniques,  to  guide  test  data  genera¬ 
tion. 

Recognizing  that  concurrency-related  faults  have  historically  proven  very  difficult  to  test  for,  research¬ 
ers  at  Stanford  University  have  developed  an  alternative  approach.  They  propose  a  type  of  self-checking 
Ada  program  which  provides  run-time  detection  and  recovery  of  faults.  This  approach  exploits  two  anno¬ 
tation  langu^es  which  are  used  to  include  assertions  speci^ing  correct  behavior  in  the  program  code. 
ANNotated  Ada  (ANNA)  [Luck84a]  provides  assertions  on  statements,  variables,  and  program  units. 
Whereas  the  Ihsk  Sequencing  Language  (TSL)  [Luck87]  is  used  to  specify  constraints  to  be  satisfied  by 
the  sequences  of  tasking  events  occurring  in  the  execution  of  the  program.  The  annotations  given  in  these 
languages  are  automatically  transformed  into  run-time  checks  that  monitor  the  consistency  of  the 
behavior  of  the  program  wiffi  its  formal  specification.  (In  the  case  of  ANNA,  optimize  checks  and  a  lim¬ 
ited  variety  of  proof  techniques  to  analyze  the  consistency  of  checks  prior  to  run-time  are  also  available.) 
The  run-time  checking  against  specifications  can  be  executed  in  parallei  with  the  underlying  Ada,  reduc¬ 
ing  the  overhead  involved  and  allowing  the  checks  to  be  a  permanent  part  of  the  Ada  program. 

The  current  version  of  TSL,  TSL-1,  can  be  used  to  aid  the  design  of  Ada  tasking  programs,  but  is  pri¬ 
marily  intended  to  support  testing  and  debugging.  Constructs  for  defining  abstract  tasks  will  be  added  to 
TSL-1  to  develop  a  new  language,  TSL-2,  suitable  for  specifying  distributed  systems  prior  to  their  imple¬ 
mentation. 

3.3  Specific  Testing-Related  Automation  Issues 

Although  the  majority  of  dynamic  analysis  techniques  are  supported  by  automated  tools,  these  tools 
are  largely  research  vehicles  or  prototypes.  The  lack  of  widely  available,  production  quality  tools  is  one  of 
the  factors  contributing  to  the  lag  in  transitioning  state-of-the-art  techniques  into  common  practice. 
While  this  deficiency  must  be  rectified,  development  of  individual,  stand-alone  tools  is  not  the  solution.  It 
is  unlikely  that  a  single  testing  technique  will  ever  be  sufficient  to  guarantee  reliable  software.  Current  evi¬ 
dence  indicates  that  a  simple  succession  of  techniques,  where  each  is  applied  independently  of  its  prede¬ 
cessors  and  successors,  is  neither  effective  nor  efficient.  Comprehensive  testing  environments  which  sup¬ 
port  integrated  application  of  a  variety  of  techniques  are  critically  needed.  In  this  context,  integrated 
means  that  each  technique  builds  on  (partial)  information  gathered  by  previous  techniques  and  may  itself 
provide  information  to  be  used  by  later  techniques.  On-going  efforts  investigating  the  development  of 
environments  which  support  the  cooperative  application  of  dynamic  and  static  techniques  are  discussed 
in  Section  4.6. 

There  are,  however,  a  number  of  automation  issues  which  are  specific  to  dynamic  analysis.  These  are 
discussed  below. 
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3.3.1  Coverage  Analyzers 

Coverage  analyzers,  also  called  coverage  monitors,  record  information  about  the  structural  elements 
executed  during  a  program  run.  Coverage  analyzers  for  sequential  software  have  been  available  for  many 
years.  Those  for  concurrent  and  real-time  software,  however,  lie  in  the  realm  of  research  rather  than  prac¬ 
tice.  The  significant  problem  that  occurs  in  the  case  of  concurrent  and  real-time  software  is  that  the  cov¬ 
erage  must  reflect  all  possible  rtming  relations,  of  which  there  may  be  an  infinite  number. 

Research  into  coverage  analyzers  for  conciurent  software  is  following  two  different  approaches.  Pro¬ 
gram  transformation  techniques  add  logic  to  a  concurrent  program  to  record  pertinent  information  about 
its  execution.  Alternatively,  specialized  run-time  systems  can  be  developed  by  modifying  the  run-time 
scheduler  to  directly  record  the  requisite  information.  Work  in  the  area  of  program  transformation  has 
been  pioneered  by  German,  Luckham,  Helmbold,  [Germ82a]  and  Rosenblum  [RoseSSb].  They  have 
developed  techniques  for  transforming  a  concurrent  Ada  program  into  an  equivalent  program  that  exhi¬ 
bits  the  same  behavior  but  also  records  the  state  transitions  through  which  the  program  progresses.  The 
drawback  of  this  type  of  approach  is  the  increased  overhead  which  arises  from  extra  entry  calls  and  the 
monitor  task  used  to  record  all  tasking  activities.  This  overhead  is  generally  unacceptable  for  real-time 
software  where  any  instrumentation  of  the  code  can  significantly  alter  the  tuning  characteristics  and, 
hence,  behavior  of  the  program.  One  possible  solution  for  this  problem  is  to  use  a  separate  processor  to 
monitor  and  record  information  about  the  program  execution. 

3.3.2  Debuggers 

Debuggers  aid  in  localizing  software  faults  by  allowing  the  software  developer  to  stop  the  execution  of  a 
program  at  interesting  points  and  examine  the  value  of  such  items  as  program  counters  and  variables.  An 
interactive  debugger  allows  the  developer  to  step  through  a  program,  repeatedly  stopping  and  restarting 
the  execution,  and  potentially  setting  and  clearing  monitors  as  necessary.  As  with  coverage  analyzers, 
many  debuggers  are  available  for  sequential  programs.  There  is,  however,  a  lack  of  quantitative  informa¬ 
tion  about  the  capabilities  and  relative  values  of  these  tools  that  must  be  resolved  before  specific  tools  can 
be  recommended  for  widespread  practice. 

The  indeterminacy  of  events  in  concurrent  programs  poses  a  significant  problem  for  debugging.  The 
behavior  of  the  program  is  potentially  affected  by  factors  out  of  the  software  developer’s  control.  Conse¬ 
quently,  it  may  be  extremely  difficult  to  reproduce  the  circumstances  that  led  to  a  fidlure  and  diagnose  its 
cause.  Moreover,  the  debugger  itself  may  interfere  with  the  execution  time  and  other  run-time  characteris¬ 
tics  (such  as  p^ng  behavior),  resulting  in  modified  patterns  of  interaction  between  the  processes  being 
executed.  Collecting  information  which  is  useful  in  di^nosing  faults  is  also  non-trivial,  since  a  failure  may 
not  actually  occur  until  the  program  execution  is  severely  corrupted.  In  consequence,  debuggers  for  con¬ 
current  software  must  be  more  closely  intertwined  with  testing  activities  than  their  sequential  coimter- 
parts. 

Researchers  distinguish  between  two  types  of  concurrency;  namely,  multiple  processes  executing  on  a 
single  processor  or  concurrent  processes  executing  on  several  processors.  In  the  first  case,  a  software 
developer  can  look  at  the  order  in  which  events  occur  to  determine  the  causes  of  failures,  and  can  stop  all 
processes  at  the  same  instant  when  necessary.  An  example  of  work  in  this  area  is  that  being  conducted  by 
Hehnbold  and  Luckham  at  Stanford  University  [Helm83].  These  researchers  differentiate  between  possi¬ 
ble  types  of  Ada  tasking  faults.  Task  sequencing  faults  occur  when  a  program’s  tasks  interact  in  a  different 
order  than  anticipated.  Deadness  faults  occur  when  a  task  communication  failure  prevents  part  of  a  con¬ 
current  computation  from  proceeding.  Whereas  most  deadness  faults  can  be  detected  by  analysis  of  the 
computation  of  a  program,  task  sequencing  faults  often  require  additional  information  pertaining  to  the 
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intended  behavior  of  the  program.  Hehnbold  and  Luckham  have  developed  a  prototype  tool  which  i 

detects  and  analyzes  deadness  faults.  Program  transformation  techniques  are  used  to  mo<!^  each  task  to 

inform  a  special  monitor  task  about  tasking  actions  about  to  be  performed.  Based  on  this  information,  the 

monitor  maintains  a  continually  updated  picture  about  the  program’s  taskii^  state,  which  is  checked  for 

deadness  faults  whoever  new  information  is  received.  To  aid  in  debugging,  the  monitor  can  print 

snapshots  of  the  tasking  picture  and  trace  task  interactions.  The  interactive  interne  to  the  tool  allows  the 

software  developer  to  single-step  throi^  task  interactions  and  request  diagnostic  information  as  needed.  j 

Future  extensions  to  the  TSL  monitor  will  provide  the  capability  to  monitor  user  specified  assertions 
about  task  events  and  so  facilitate  detection  of  task  sequencing  faults  [HelmSS].  This  is  part  of  an  overall 
effort  to  develop  a  testbed  for  specific  kinds  of  Ada  tasking  programs,  such  as  run-time  schedulers  for 
Ada  tasking  on  multiple  processors.  This  effort  includes  the  development  of  a  monitor/debi^er  for  TSL 
on  multiprocessors  systems  (Sequent  Symmetry),  and  the  definition  of  debugging  techniques  utilizing  TSL  \ 

specifications  and  the  TSL  monitor. 

In  the  second  case,  where  processes  execute  on  multiple  processors,  the  order  in  which  events  occur 
can  only  be  determined  by  looking  at  the  communications  between  the  processes.  These  communications 
must  be  disabled  to  suspend  all  processes  at  the  same  relative  time  (there  is  a  communication  delay  prob¬ 
lem  which  makes  it  impossible  to  suspend  processes  at  the  same  actual  time).  A  common  approach  for  { 

this  class  of  debt^ers  is  to  use  a  hierarchy  of  tools  consisting  of  a  traditional  debugger  to  aid  in  testing 
each  process  independently  and  a  higher-level  debugger  to  monitor  communications.  A  review  of 
debuggers  for  concurrent  programs  executing  on  different  processors  is  given  in  [Gord88]. 

In  [Gord88],  Gordon  and  Finkel  also  describe  their  tool  TAP  which  is  implemented  on  the  Charlotte 
{Fink83]  distributed  operating  system.  TAP  is  used  to  detect  timing  faults  which  are  caused  by  misorder-  < 

ing  of  the  communication  events  between  processes.  It  maintains  a  history  of  the  communication  events 
that  occur  during  software  execution  as  a  directed,  acyclic  graph,  called  a  timing  graph.  The  nodes  in  a 
timing  graph  represent  events,  whereas  arcs  from  one  node  to  another  indicate  the  temporal  pre¬ 
cedence  between  events.  TAP  is  designed  to  be  always  active,  thus  overcoming  the  problem  of  the 
debugger  changing  the  behavior  of  the  program  being  monitored  and  potentially  masking  faults.  Using 
TAP,  a  skeleton  tuning  graph  is  continually  built  during  the  program  execution.  When  a  fault  is  encoun-  i 

tered,  TAP  suspends  all  processes,  constructs  the  full  timing  graph,  and  waits  for  instructions  from  the 
user.  It  then  aids  in  diagnosing  the  cause  of  the  fault  by  allowing  the  user  to  backtrack  through  the  timing 
graph  to  examine  the  order  in  which  events  occurred  and  the  contents  of  messages.  Since,  unlike  most 
debuggers  of  this  class,  TAP  can  analyze  conununication  events  after  the  fact  and  does  not  require 
active  control  over  the  execution,  it  can  be  used  both  during  program  testing  and  to  find  timing  faults  in 
operational  software.  « 

The  worst  case  debugging  problem  occurs  when  concurrent  processes  must  operate  under  real-time 
constraints  on  a  bare  machine.  Here  the  common  practice  is  to  develop  and  test  the  software  on  a  host 
machine  where  extensive  development  support  is  available  and  then  perform  limited  testing  on  the  target 
machine.  The  target  may  provide  no  support  for  debugging  activities.  If  a  failure  occurs  during  testing  on 
the  target,  it  is  often  necessary  to  return  to  the  host  to  isolate  the  cause.  Even  if  the  same  scheduler  algo-  ( 

rithm  is  used  on  both  machines,  different  behaviors  may  occur  due  to  differences  in  processor  construc¬ 
tion.  Examples  of  factors  which  cause  the  target  execution  to  deviate  from  the  host  execution  include:  a 
less  (or  more)  precise  real-time  clock,  real-time  input  simulators  on  the  host  that  operate  at  a  different 
rate  to  the  actual  inputs  to  the  target,  and  vmiations  in  the  relative  speed  of  the  processors.  Taylor,  at  the 
University  of  California  (Irvine),  is  investigating  techniques  to  allow  reconstruction  of  an  erroneous  target 
execution  at  the  source  language  level  (Ada)  on  a  host  machine  [Ihyl82b].  Although  prototype  tools  are  ( 

being  developed,  it  will  likely  be  several  years  before  practical  solutions  to  this  problem  are  available. 
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3.3.3  Built-In  Test 

Although  nearer  to  the  hardware  aspect  of  instnimentation,  the  issue  of  built-in-test  (BIT)  [Batt87] 
must  also  be  considered.  Beginning  in  the  late  1960’s,  troubleshooting  of  electronic  components  embed¬ 
ded  in  weapon  systems  became  too  complex  for  traditional,  manual  approaches.  As  a  result,  diagnostic 
probes  b^m  to  be  built  into  electronic  components  to  allow  easier  detection  of  faults.  These  probes,  or 
diagnostics,  must  be  designed  into  electronic  components;  they  cannot  be  added  as  an  afterthought.  Most 
DOD  embedded  weapon  sjrstems  are  designed  with  BIT  as  an  operational  requirement. 

In  a  similar  vein,  a  separate  autonomous  machine  can  be  used  to  monitor  system  execution.  Although 
potentially  expensive,  tUs  approach,  theoretically,  does  not  affect  timing  characteristics  and  can  provide 
“playback”  for  reproducii^  specific  execution  sequences.  It  can  also  be  used  to  limit  the  execution  over¬ 
head  incurred  for  self-checking  code,  or  the  dynamic  monitoring  provided  by  tools  such  as  TAP.  Support 
for  permanent  self-test  of  concurrent  software  by  downloading  the  checking  of  TSL  specifications  onto 
spare  processors  is  under  investigation  at  Stanford  University. 

3.4  Summary  of  Migor  Gaps  in  Dynamic  Analyds  Technology 

So  far  this  section  has  outlined  the  dynamic  analysis  technology  that  has  been,  or  is  being,  developed. 
Not  all  of  this  technology  is  currently  available  for  use,  or  would  be  suitable  for  SDS  software  efforts  were 
it  available.  But  what  technology  is  needed?  Other  sections  of  this  report  raise  the  question  of  how  all  the 
different  forms  of  testing  and  evaluation  should  fit  together,  and  how  they  should  cooperate  with  develop¬ 
mental  activities.  Here,  a  number  of  technology  gaps  that  are  particular  to  dynamic  analysis  and  indepen¬ 
dent  of  these  larger  issues  can  be  identified.  By  and  large,  these  gaps  are  not  simple,  independent  prob¬ 
lems;  the  state-of-the-art  is  not  yet  at  this  point.  Instead,  these  gaps  reflect  some  of  the  fundamental 
deficiencies  in  software  testing  and  evaluation. 

3.4.1  Need  for  Oracles 

Most  dynamic  analysis  techniques  require  some  means  for  determining  whether  the  program  output  is 
correct.  Conceptually,  at  least,  this  implies  the  need  for  an  oracle  which  can  produce  “correct”  programs 
outputs  against  which  actual  outputs  can  be  compared.  In  practice,  humans  usually  play  the  part  of  the 
oracle,  although  this  can  result  in  high  testing  costs  and  is  often  inaccurate  for  all  but  small  programs. 

Some  techniques  resolve  the  lack  of  an  oracle  by  creating  formal,  executable  specifications  to  generate 
the  expected  output.  When  these  derive  specifications  from  the  code  itself,  however,  there  is  at  least  the 
possibility  of  mirroring  erroneous  program  assumptions  in  the  created  specification.  Some  researchers 
have  proposed  using  N-version  programming  to  deduce  correct  outputs  based  on  the  consensus  of  the  N- 
versions  of  a  program.  Unfortunately,  recent  research  indicates  some  problems  with  the  fundamental  N- 
version  hypothesis  that  requires  statistical  independence  between  failures  in  the  N-versions  of  a  program 
(see  Section  2.1.3). 

It  is  probable  that  no  adequate  solution  of  this  problem  will  be  achieved  until  increased  formalization 
of  early  lifecycle  activities  yields  sufficiently  formal  specifications  (embodying,  for  example,  first-order 
predicate  calculus)  to  define  software  processes.  Even  then,  the  difficulty  in  defining  all  correct 
sequences  of  events  poses  problems  for  concurrent  and  real-time  software. 
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3.4.2  Completeness  of  Analy^ 

What  is  the  meaning  of  a  successful  test?  A  good  test  is  one  which  exposes  faults.  The  meaning  of  a  test 
in  which  the  software  executes  successfully  is  uncertain;  it  does  not  necessarily  indicate  the  absence  of 
faults.  This  is  part  of  the  larger  question  of  completeness  or,  in  other  words,  Icnomng  what  has  been 
achieved  in  testing  activities.  Although  completeness  in  general  applies  to  the  overall  testing  and  evalua¬ 
tion  activity,  it  is  particularly  troubling  for  dynamic  analysis  techniques.  Unlike  many  other  forms  of  test¬ 
ing  and  evaluation,  these  techniques  are  not  simply  applied  or  not  applied;  they  can  be  applied  to  widely 
varying  extents. 

Much  research  is  needed  before  this  issue  can  be  fully  understood.  Meanwhile,  a  very  simplistic  solu¬ 
tion  is  to  tie  the  notion  of  completeness  to  the  concept  of  minimum  levels  of  testing  covers^e.  (This  is  an 
extremely  limited  translation  of  completeness  and  is  proposed  only  as  a  stop-gap  measure  for  practical 
purposes;  that  is,  it  provides  a  step  forward  but  falls  a  long  way  short  of  the  ideal.)  Wide  acceptance  of  a 
minimum  level  of  testing  coverage  for  all  software  systems  has  long  been  sought.  This  is  starting  to  be 
achieved  as  industry  increasii^y  accepts  branch  testing  as  the  minimum  coverage  requirement.  In  the 
case  of  SDS,  however,  branch  testing  is  an  inadequate  across-the-board  minimum.  A  hierarchy  of 
minimum  coverage  requirements  must  be  determined  such  that  increasingly  severe  coverage  requirements 
can  be  mapped  against  increasingly  critical  software.  These  coverage  measures  must  address  different 
aspects  of  dynamic  analysis,  such  as  the  coverage  required  for  structural  and  data  flow  testing,  fault-based 
testing  adequacy  measures,  and  functional  testing  coverage. 

3.4.3  Assessment  of  Capabilities  of  Techniques 

The  last  several  years  have  seen  a  relative  increase  in  experimental  evaluations  of  dynamic  analysis 
techniques.  For  illustrative  purposes,  the  results  from  two  of  these  evaluations  are  reproduced  in  Figure 
3-3  and  Table  3-2.  Experiments  such  as  these  not  only  provide  empirical  evidence  of  the  utility  of  particu¬ 
lar  techniques  for  detecting  certain  types  of  errors  or  faults,  but  some  general  guidance  on  bow,  or  which, 
techniques  can  support  each  other  for  more  thorough  analysis.  While  some  of  these  experiments  were 
admittedly  conducted  on  small,  sample  programs,  others  utilized  realistic  programs  intended  for  practical 
use.  The  chief  handicaps  in  comparing  the  results  across  experiments  remain;  (1)  the  continuing  lack  of  a 
standardized  error/fault  categorization  scheme,  and  (2)  the  limited  volume  of  experiments. 

This  work  notwithstanding,  practical  descriptions  of  the  error/fault  detection  capabilities  and  costs  of 
existing  techniques  are  still  unavailable.  In  a  few  cases,  such  as  data  flow  testing  and  domain  testing,  sim¬ 
ple  cost  information  pertaining  to  the  number  of  test  cases  needed  to  achieve  various  levels  of  testing  cov¬ 
erage  is  available,  lliis  must  be  supported  by  data  on  the  cost  of  generating  and  executing  test  cases 
which,  of  course,  is  impacted  by  the  available  automated  support  tools  and  the  underlying  operating 
environment.  The  cost  to  analyze  test  results  can  also  be  a  pertinent  factor.  Even  so,  this  information  is 
only  meanii^ful  in  terms  of  the  increased  reliability  these  costs  deliver.  Since  testing  is  a  time  consuming 
and  expensive  task,  it  is  important  that  “the  point  of  diminishing  returns”  can  be  recognized  in  a  timely 
manner,  as  this  applies  to  the  case  in  band. 

Information  on  capabilities  should  specify  how  the  performance  of  a  technique  varies  for  different 
types  of  programs.  For  example,  the  use  of  certain  control  and  data  structures  can  severely  degrade  the 
cost-effectiveness  of  certain  techniques.  Some  techniques  simply  do  not  handle  array  references;  arrays 
are  treated  as  a  single  variable  and  array  elements  are  not  differentiated.  A  series  of  weightings  that  distin¬ 
guish  between  programs  in  terms  of  their  testing  difficulty  is  needed.  Assessment  of  a  technique’s 
effectiveness  must  also  take  into  account  any  fundamental  limitations  or  assumptions.  For  example,  coin¬ 
cidental  correctness  is  a  limiting  assumption  of  many  techniques.  Coincidental  correctness  occurs  when  a 
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Figure  3-3.  Frequency  of  Errors  Detected 

fault  is  executed  but  the  program  does  not  fail,  thus  an  incorrect  program  may  be  falsely  assumed  correct. 
Determining  the  absence  (or  presence)  of  coincidental  correctness  is  generally  undecidable,  but  some 
new  analysis  techniques  are  beginning  to  address  how  to  ensure  that  the  effect  of  a  triggered  fault  is  indeed 
transferred  to  the  output.  The  relative  costs  and  benefits  of  using  techniques  as  either  test  data  generation 
strategies  or  adequacy  measurement  tools  should  also  be  investigated.  ITiis  latter  point  is  just  one  part  of 
the  more  general  question  pertaining  to  the  cost-effectiveness  of  variously  combined  applications  of  tech¬ 
niques. 
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Test  Data  Technique 

#Test 

Cases 

Mutants 

Killed 

Mutation 

Score 

Statement  Analysis 

5 

642 

.67 

Specification  Analysis 

13 

780 

.81 

Branch  Analysis 

9 

749 

.78 

Mnimized  Domain  Analysis 

36 

932 

.968 

Domain  Analysis 

75 

943 

.980 

Mutation  Testing 

347 

962 

1.00 

Table  3-2.  Comparison  Using  the  MOTHRA  Mutation  System 

Taken  as  a  whole,  capability  profiles  would  provide  urgently  needed  guidance  in  the  selection  and  appli¬ 
cation  of  appropriate  techniques  for  each  testing  effort.  They  would  also  be  useful  in  determining  which 
techniques  ^ould  be  provided  by  a  comprehensive  testing  environment.  Both  theoretical  and  empirical 
evaluations  of  techniques  are  ne^ed  to  acquire  the  necessary  data;  theoretical  studies  to  provide  insight 
into  the  power  and  effectiveness  of  techniques,  and  empirical  trials  to  reveal  the  ease  of  use  of  a  tech¬ 
nique.  Experimental  studies  are  particularly  important  in  dynamic  analysis  research  since  worst  case 
analysis  often  gives  very  pessimistic  upper  bounds  that  do  not  satisfactorily  reflect  practical  performance. 

3.4.4  Integrated  AppUcation  of  Techniques 

No  single  analysis  technique  is  sufficient  to  ensure  highly  reliable  software.  Researchers  are  discovering 
that  a  simple  sequence  of  several  different  techniques  applied  independently  does  not  necessarily  increase 
error/fault  detection  capabilities.  Moreover,  many  techniques  rely  on  much  of  the  same  basic  information 
(such  as  control  and  data  flow  patterns),  and  sequential  application  of  techniques  is  highly  inefficient  as 
the  same  information  must  be  generated  repetitively.  The  relationship  between  techniques  must  be  closely 
examined  to  identify  those  which  best  support  one  another,  or  even  subsume  others.  Powerful  and 
efficient  testing  environments  cannot  be  built  until  such  integrated  techniques  are  available.  At  a  higher 
level  of  concern,  the  relationship  between  the  different  forms  of  testing  and  evaluation  needs  to  be  stu¬ 
died.  The  distinction  between,  for  example,  dynamic  analysis,  static  error  analysis,  and  formal 
verification  is  narrowing  as  symbolic  evaluation  is  increasingly  a  necessary  precursor  for  each. 

These  problems  are  well-recognized  and  several  researchers  are  examining  approaches  for  integrating 
dynamic  analysis  techniques  with  some  forms  of  static  analysis.  Hopefully,  integrated  techniques  will  start 
emerging  in  a  couple  of  years.  Meanwhile,  the  larger  picture  of  integration  must  not  be  ignored. 

3.4.5  Analysis  of  Concurrent  and  Real-Time  Software 

There  is  an  acute  lack  of  techniques  for  dynamic  analysis  of  concurrent  and  real-time  software. 
Although  there  is  an  emerging  set  of  static  techniques,  only  dynamic  approaches  are  able  to  detect  prob¬ 
lems  arising  from  the  execution  environment  of  the  software. 
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Unlike  the  foundation  provided  by  graph  theory  for  analysis  of  sequential  programs,  there  is  no  com¬ 
monly  accepted  theoretic^  base  for  analysis  of  concurrent  and  real-time  software.  Fundamental  research 
is  ne^ed  to  develop  a  firm  foundation  from  which  technology  can  evolve.  There  are  a  few  instances  of 
such  work,  for  example  that  being  conducted  by  Weiss  [WeisSSa].  At  the  current  time,  it  would  be  prema¬ 
ture  to  select  just  one  of  the  emerging  approaches  and  concentrate  research  resources  in  that  direction. 
Multiple  research  avenues  must  be  pursued  with  plenty  of  cross-fertilization  and  dissemination  of  infor¬ 
mation  between  efforts.  In  view  of  its  dependence  on  the  operating  environment,  this  research  should 
include  efforts  which  take  a  combined  view  of  software  and  hardware  concerns. 
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4.  STATIC  ANALYSIS  TECHNOLOGY 

There  are  many  different  types  of  static  analysis.  For  practical  purposes,  formal  verification  techniques 
and  techniques  oriented  towards  measurement  of  critical  software  properties  are  discussed  separately,  in 
Sections  5, 6,  and  7. 

Remaining  static  analysis  techniques  can  be  divided  into  four  groups.  The  first  group  consists  of  those 
techniques  which  produce  general  information  about  a  program,  for  example,  symbol  cross-referencers, 
rather  than  search  for  actual  faults.  These  are  relatively  common  and  often  provided  by  a  compiler.  The 
second  group,  static  error  analysis^  techniques,  are  designed  to  detect  specific  classes  of  faults  or 
anomalous  constructs  in  a  program.  They  focus  on  type  and  units  analysis,  reference  analysis,  expression 
analysis,  and  interfitce  analysis.  While  some  types  of  static  error  analysis  can  be  automated,  others  are 
restricted  to  manual  application.  In  contrast,  the  third  group,  symbolic  evaluation  techniques,  are  entirely 
automated.  The  final  group  consists  of  manual  review  techniques,  namely  code  inspections  and  structured 
walkthroughs.  This  section  concentrates  on  the  latter  three  categories  and  discusses  the  state-of-the-art  in 
these  types  of  static  analysis  for  sequential  programs,  concurrent  and  real-time  programs,  and  pre¬ 
implementation  products. 

The  final  subsections  summarize  the  major  technological  gaps  in  this  area  and  review  the  current  trends 
in  automated  support  for  both  dynamic  and  static  analysis. 

4. 1  Techniques  for  Static  Analysis  of  Sequential  Programs 

Data  flow  analysis  was  one  of  the  earliest  static  error  analysis  techniques  and  focuses  on  the  detection 
of  violations  of  sequencing  constraints.  It  was  derived  from  work  in  compiler  code  optimization  where 
information  is  gathered  to  increase  code  efficiency  by  eliminating  unnecessary  computations.  Initial  work 
on  data  flow  analysis  was  conducted  by  Fosdick  and  OsterweU.  They  defined  a  data  flow  anomaly  as  “a 
sequence  of  the  events  reference  (r),  definition  (d),  and  undefinition  (u)  of  variables  in  a  program  that  is 
either  erroneous  in  itself  or  often  symptomatic  of  an  error”  [Fosd76a].  A  software  developer  is  required 
to  specify  all  desirable  and  required  sequences  of  events,  the  program  is  then  analyzed  to  detect  viola¬ 
tions,  wffich  may  arise  from  omitted  and  superfluous  code  errors. 

Numerous  notations  for  specifying  sequencing  constraints  have  been  developed.  These  are  based,  for 
example,  on  the  path  expressions  of  Habermann  [Camp79,Kieb83];  finite  state  machines  [Howd83];  flow 
expressions  (an  extension  of  regular  expressions)  [Shaw78];  and  axiomatic  specifications  [Gutt78a].  Vari¬ 
ous  tools  for  applying  data  flow  analysis  have  been  developed  [Fosd76a,Brow78,Conr85],  mostly  for 
analyzing  Fortran  programs.  Data  flow  analysis  has  also  been  proposed  as  a  way  of  guiding  test  data  selec¬ 
tion;  the  resulting  data  flow  testing  techniques  are  discussed  in  Section  3.1.1.  Much  of  the  early  data  flow 
analysis  work  focused  on  techniques  and  tools  for  the  detection  of  fixed,  and  frequently  limited,  classes  of 
data  flow  conditions.  In  the  last  few  years,  Olender  and  Osterweil  have  developed  a  more  flexible  mechan¬ 
ism  for  specifying  a  variety  of  event  sequencing  problems,  which  can  be  mechanically  translated  into  algo¬ 
rithms  capable  of  solving  these  problems  [01en86]. 

Another  common  static  error  analysis  technique  is  interface  analysis.  In  [Howd87],  Howden  identifies 

3.  As  is  the  case  with  various  aspects  of  error-based  and  fault-based  testing,  the  term  static  error  analysis  is  a  historical  artifact. 

These  techniques  would  be  more  properly  categorized  as  static  fault  uialysis  techniques. 
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three  levels  of  interface  analysis,  these  are:  module-interface  analysis,  data-inter&ce  analysis,  and 
operator-interface  analysis.  Module-interface  analysis  is  the  highest  level  of  interface  analysis.  It  is  used  to 
analyze  the  interfaces  between  system  objects  for  consistency,  completeness,  and  redundancy.  Data- 
interface  analysis,  the  next  level  of  analysis,  exploits  user  provided  descriptions  of  the  expected  transfor¬ 
mations  to  examine  the  transformations  of  one  type  of  data  into  another  that  occur  within  a  program 
module,  finally,  the  lowest  level  of  analysis,  operator-interface  analysis,  analyzes  data  structure  operators 
to  determine  whether  (1)  these  operators  are  applied  to  appropriately  typed  objects,  and  (2)  the  sequence 
of  operators  is,  in  fact,  legal.  (Of  course,  these  types  of  interface  a^yses  are  automatically  provided  by 
compilers  for  languages  such  as  Ada  which  provide  encapsulation  and  strong-typii^). 

During  empirical  studies  of  large  systems,  Howden  noted  that  the  significant  problems  in  these  systems 
arise  from  the  difficulty  of  keeping  track  of  data,  rather  than  the  more  specific  t^s  of  faults  such  as  com¬ 
putation  faults.  This  led  to  the  development  of  a  new  analysis  technique  for  detecting  decomposition 
errors,  called  flavor  analysis  [HowdS?].  Here )7avors  denote  the  meaning  of  particular  variables  and  can  be 
used  in  much  the  same  way  as  type  iMormation  is  used  to  determine  proper  variable  usage.  In  order  to 
detect  incorrect  assumptions  (concerning  variable  names  or  missing  code,  for  example),  flavor  statements 
are  included  in  the  code  and  subsequently  analyzed  for  consistency.  Howden  describes  two  types  of  flavor 
statements,  those  which  summarize  (1)  current  assumptions  about  flavors,  and  (2)  the  flavor  effects  of 
statements  or  sections  of  code  [Howd89].  Although  flavor  statements  can  be  translated  into  both 
compile-time  and  run-time  checks,  Howden  states  that  their  primary  use  is  in  static  detection  of  false 
assumption  decomposition  errors.  An  initial  flavor  analysis  tool  has  been  developed.  An  advanced  tool 
which  can  check  assumptions  about  scheduling  and  temporal  assumptions  is  plumed  to  support  flavor 
analysis  of  concurrent  and  real-time  software. 

Symbolic  evaluation  is  a  technique  whereby  a  program  is  executed  over  symbolic  rather  than  actual 
data.  Symbolic  values  are  ass^ed  to  input  variables  and  the  user  selects  a  path  through  the  program  to  be 
evaluated.  Each  expression  along  this  path  is  evaluated  by  substituting  symbolic  values  for  the  variables. 
Thus,  symbolic  evaluation  can  be  used  to  analyze  the  conditional  branching  predicates  of  a  program,  in 
addition  to  the  output  computations.  The  approach  can  be  varied  by  symbolically  evaluating  a  program 
over  a  mixture  of  actual  and  symbolic  data,  or  examining  the  traces  of  intermediate  symbolic  values 
assigned  to  variables. 

Symbolic  evaluation  is  used  for  a  variety  of  purposes.  In  symbolic  testing  [Howd77a],  symbolic  values 
of  output  variables  are  generated  and  compared  (usually  manually)  against  a  program  specification  to 
identify  faults.  Symbolic  systems  of  predicates  for  a  program  path  can  be  used  to  analyze  the  subsets  of 
the  input  domain  that  cause  particular  program  paths  to  be  executed;  test  data  to  execute  these  paths  can 
also  be  generated  [RichSSa].  In  this  capacity,  symbolic  evaluation  techniques  are  an  essential  element  of 
many  of  the  dynamic  analysis  approaches  discussed  previously.  They  also  provide  the  capability  to  detect 
infeasible  paths,  althot^  rules  for  determining  feasibility  cannot  detect  all  such  paths.  Symbolic  evalua¬ 
tion  is  also  used  in  proofs  of  correctness.  Meanwhile,  its  potential  continues  to  be  investigated.  Perhaps 
the  most  recent  innovative  application  of  symbolic  evaluation  is  Morell’s  approach  to  combining  symbolic 
evaluation  with  mutation  testing,  yielding  a  fault-based  dynamic  analysis  technique  with  both  glob^  extent 
and  infinite  breadth  (see  Section  3.1.3). 

4.2  Techniques  for  Staflc  Analysis  of  Concurrent  and  Real-Time  Programs 

Taylor  and  Osterweil  have  extended  data  flow  analysis  techniques  to  detect  faults  and  anomalies  in  con¬ 
current  programs  [TaylSOa].  The  same  classes  of  variable  usa^e  faults  sought  in  sequential  programs  can 
be  found  in  concurrent  programs,  for  example,  referencing  an  uninitialized  variable.  A  number  of  data 
flow  faults  unique  to  concurrent  programs  can  also  be  detected.  In  this  latter  case,  Taylor  and  Osterweil 


40 

UNCLASSinED 


UNCLASSIFIED 


(  have  developed  algorithms  for  detecting  the  following  faults  and  anomalies:  (1)  waiting  for  an 

unscheduled  process,  (2)  scheduling  a  process  in  parallel  with  itself,  (3)  waiting  for  a  process  guaranteed 
to  have  previous^  terminated,  (4)  referencing  a  variable  which  is  being  defined  by  a  parallel  process,  and 
(5)  referencing  a  variable  whose  value  is  indeterminate.  Both  interprocess  and  interprocedural  data  flow 
are  analyzed,  based  on  process  augmented  flowgraphs  (PAFs)  which  provide  a  graph  representation  of  a 
system  of  communicatii^  concurrent  processes, 
k 

Another  area  being  investigated  by  Thylor  is  static  concurrency  analysis.  This  technique  identifies 
anomalous  synchronization  patterns  in  concurrent  programs  throu^  state-based  program  analysis  tech¬ 
niques.  Unlike  dynamic  analysis  approaches,  static  concurrency  analysis  can  potentially  examine  all  pos¬ 
sible  synchronization  patterns.  It  does,  however,  suffer  some  disadvantages.  Static  analysis  cannot  iden¬ 
tify  all  infeasible  paths  and  so  may  report  spurious  faults  involving  these  paths.  Moreover,  static  con- 
t  currency  analysis  has  been  shown  to  be  NP-complete  and  is  only  practically  useful  for  programs  compris¬ 

ing  a  relatively  small  number  of  processes.  As  with  all  static  approaches,  it  is  weak  in  dealing  with  dynami¬ 
cally  identified  objects  such  as  array  elements  and  pointers,  ^th  respect  to  Ada,  this  includes  task  entry 
fan^es  and  tasks  that  are  components  of  dynamic^y  allocated  data  objects. 

Together  with  other  researchers  at  the  University  of  California  (Irvine),  Taylor  has  developed  an 
\  approach  for  mitigating  some  of  these  problems.  Here  static  concurrency  analysis  is  combined  with  sym¬ 

bolic  execution,  by  interleaving  phases  of  the  two  techniques  [Youn86a].  This  interleaving  allows  symbolic 
execution  to  prune  away  the  infeasible  paths  otherwise  identified  by  concurrency  analysis,  and  con¬ 
currency  analysis  to  support  symbolic  execution  by  selecting  paths  leading  to  possible  concurrency-related 
faults.  Tools  to  apply  this  approach  to  Ada  programs  are  under  development. 

t  Another  group  of  researchers,  at  Stanford  University,  are  developing  a  language  for  the  specification  of 

distributed  Ada  systems  that  will  facilitate  both  static  and  dynamic  analysis  [Luck87].  TSL  (see  Section 
3.2)  is  intended  to  provide  rigorous  investigation  of  concurrent  programs  wiftout  the  overhead  imposed 
by  formal  verification.  The  first  version  of  TSL  was  developed  to  explore  the  underlying  concepts 
involved  in  the  specification  of  concurrent  systems.  Based  on  their  success  in  so  doing,  a  more  general- 
purpose,  second  version  of  the  language  (TSL-2)  has  been  developed,  together  with  some  automated 
k  analysis  tools. 

4.3  Techniques  for  Static  Analysis  of  Pre-Code  Products 

Data  flow,  concurrency,  and  interface  analysis  are  not  restricted  to  code  products.  Assuming  that 
1  appropriately  formalized  representations  are  used,  they  can  also  be  applied  to  pre-implementation  pro¬ 

ducts.  For  example,  researchers  at  the  University  of  Massachusetts  have  being  investigating  issues  revolv¬ 
ing  around  the  description,  enforement,  and  analysis  of  relationships  between  system  components.  Their 
approach  to  providing  Precise  Interface  Control  is  called  PIC  [Wolf86c].  PIC  extends  the  visibility  con¬ 
cepts  of  declaration,  scope,  and  binding  which  underlie  traditional  interface  analysis  by  distinguishing 
between  two  types  of  visibility.  These  are  requisition  of  access  (which  occurs  when  an  entity  requests  the 
k  right  to  make  reference  to,  or  use,  some  set  of  entities)  and  provision  of  access  (which  occurs  when  a 

entity  grants  the  right  of  reference  to  some  set  of  entities). 

The  PIC  approach  supports  a  variety  of  analyses.  Basic  interface  analysis  examines  type  and 
requisition/provision  information  to  determine  the  interface  consistency  within  and  among  modules.  Stub 
analysis  checks  the  consistency  between  the  view  taken  by  each  referencing  module  of  a  particular  stub, 
►  and  the  consistency  of  each  of  these  views  against  some  “official”  specification  of  that  module.  Finally, 

update  analysis  compares  two  versions  of  the  same  submodule  to  look  for  changes  in  declarations, 
requisition/provision  specifications,  or  references  to  non-local  entities.  Although  update  analysis  does 
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not  directly  address  interface  analysis,  it  is  useful  for  identifying  parts  of  the  software  which  require 
reanalysis  following  some  chaise.  In  each  case,  different  forms  of  these  analyses  are  provided  to 
correspond  to  different  types  of  (sub)modules. 

The  AdaFIC  toolset  is  an  instantiation  of  the  PIC  approach  for  use  with  large  Ada  systems.  This  proto¬ 
type  toolset  is  being  tidlored  to  support  consistent  abstractions,  incremental  analysis,  and  order- 
independent  development  so  that  an  incremental  approach  to  interface  control  can  be  adopted. 

Safety  analysis  is  a  critical  concern  for  complex  systems  such  as  the  SDS.  Even  if  error-free  software 
were  a  feasible  objective,  the  possibility  of  failures  in  the  operating  environment  still  pose  safety  risks. 
Leveson  defines  a  technique  caUed  Software  Fault  Tree  Analysis  (SFTA)  for  analyzing  the  safety  of  a 
software  design,  independently  of  its  functionality.  This  technique  can  be  performed  at  various  levels  of 
abstraction,  on  either  designs  or  code  (the  use  of  SFTA  on  Ada  code  is  discussed  in  [Cha88]). 

SFTA  is  derived  from  fault  tree  analysis  (FTA),  a  technique  originally  developed  to  measure  hardware 
safety.  This  genesis  has  the  benefit  of  allowing  hardware  and  software  fault  trees  to  be  linked  together  at 
their  interfaces,  so  that  an  entire  system  can  be  analyzed.  FTA  starts  with  a  hazard  analysis  of  the  system 
where  potential  hazards  are  identified  and  classified  according  to  their  severity.  As  part  of  this  process, 
failures  which  impact  system  safety  are  distinguished  from  nonsafety  failures;  safety  failures  are  those 
where  the  ability  to  provide  degraded  operation  takes  second  place  to  the  need  to  minimize  the  damage  of 
the  failure.  The  root  of  the  fault  tree  is  then  given  by  specifying  a  critical  failure  which  is  assumed  to  have 
occurred,  called  a  loss  event.  SFTA  uses  backwards  reasoning  to  identify  all  possible  conditions  which 
may  lead  to  this  failure,  building  the  fault  tree  by  showing  the  relationships  between  these  conditions.  This 
analysis  continues  until  all  the  leaves  susceptible  to  analysis  describe  events  of  calculable  probability.  At 
the  point  where  the  fault  tree  reaches  the  software  interface,  high-level  requirements  for  software  safety 
have  been  determined  based  on  software  behavior  which  could  compromise  system  safety.  SFTA  then  (1) 
demonstrates  that  the  design  logic  will  not  produce  safety  failures,  and  (2)  determines  the  environmental 
conditions  which  could  lead  to  a  software  generated  safety  failure. 

In  FTA,  failure  statistics  cat  'te  produced  and  sensitivity  analysis  used  to  measure  the  effect  of  each 
loss  event.  In  [Leve83a],  Leveson  describes  how  such  numerical  analysis  is  less  suited  for  the  software 
case  and  defines  the  following  uses  for  SFTA: 


•  Identification  of  the  most  likely  causes  of  a  loss  event,  which  can  guide  testing  and 
evaluation  efforts  and  pinpoint  those  areas  where  most  testing  dollars  should  be 
allocated. 

•  Identification  of  critical  modules  which  require  special  fault-tolerance  pro¬ 
cedures,  e.g.,  run-time  assertions,  exception  handling,  or  redundancy. 

•  Identification  of  unsafe  states  and  the  conditions  under  which  fail-soft  and  fail¬ 
safe  procedures  should  be  invoked. 

Tools  to  aid  in  the  production  and  analysis  of  fault  trees  are  being  developed  at  the  University  of  Califor¬ 
nia  (Irvine).  Researchers  plan  to  use  these  tools  to  gain  further  understanding  of  SFTA. 

4.4  Manual  Review  Techniques 

Manual  review  techniques  such  as  structured  walkthroughs  [Myer78a]  and  code  inspections  [Faga76] 
evolved  from  the  simple  desk  checking  approaches  commonly  used  by  software  developers.  Each  type  of 
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review  is  performed  by  a  team  of  software  developers,  each  of  whom  plays  a  well-defined  role  to  focus 
attention  on  different  aspects  of  the  program.  The  major  distinction  between  these  two  techniques  lies  in 
the  goal  of  the  review.  The  purpose  of  an  inspection  is  to  check  for  specific  errors  identified  on  an  error 
checklist.  Wslkthroughs  have  a  wider  scope  of  concern.  In  addition  to  manually  simulating  the  execution 
of  the  software,  the  participants  review  design  decisions  and  the  overall  approach  taken  by  the  developer. 
Although  these  reviews  are  expensive  in  terms  of  the  manual  effort  required,  they  are  among  the  most 
cost-effective  techniques  for  eliminating  faults.  They  also  offer  a  significant  side  benefit  in  team  building 
and  ensuring  back-up  knowlec^e  about  a  piece  of  so^are. 

Walkthrot^s  and  inspections  can  be  applied  in  diverse  ways.  For  example,  a  review  may  be  conducted 
by  all  participants  acting  in  concert,  or  by  participants  reviewing  the  software  product  independently  and 
then  pooling  results.  Ihble  4-1  shows  results  of  a  small-scale  experiment  which  investigated  different  appli¬ 
cations  of  review  techniques,  as  reported  in  [Myer78a].  Gannon  has  reported  on  the  frequency  with 
which  different  types  of  errors  were  identified  by  using  inspections  and  branch  testing  independently  and 
in  conjunction  [Gann79a],  see  Figure  4-1. 


Method 

Mean  #  of 
Erron  Found 

Variance 

Median#  of 
Enora  Fonnd 

Range  of 
Erron  Found 
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7.2 

3.4 

7.5 

3-10 

15 

37 

O.  Combined  A  and  C 

7.6 

4.3 

8 

5-10 

14 

15 

Table  4-1.  Comparison  of  Different  Applications  of  Review  Techniques 


These  review  techniques  are  not  restricted  to  implementation  products.  With  the  appropriate  roles 
and,  in  the  case  of  inspections,  lists  of  potential  errors,  they  can  be  applied  to  any  type  of  software  pro¬ 
duct. 

4.5  Summary  of  Mt^or  Gaps  in  Static  Analysis  Technology 

Many  of  the  critical  gaps  identified  for  dynamic  analysis  technology  also  apply  to  static  analysis.  For 
trample,  the  issue  of  integrated  application  of  techniques  affects  both  dynamic  and  static  analysis.  Quan¬ 
titative  information  on  the  capabilities  and  costs  of  static  analysis  techniques  is  also  needed.  While  such 
capability  profiles  should  be  much  simpler  to  determine  for  static  than  dynamic  techniques,  this  informa¬ 
tion  is  s^l  unavailable.  There  is  still  a  lack  of  proven  capabilities  for  the  static  analysis  of  concurrent 
software.  Even  more  than  in  the  case  of  dynamic  analysis,  existing  techniques  are  at  the  boundaries  of  the 
state-of-the-art  and  have  only  been  applied  on  small,  example  problems.  Much  additional  research  is 
needed.  In  particular,  since  static  analysis  of  concurrent  software  is  often  a  NP-complete  problem,  a 
better  understanding  of  how  this  analysis  can  be  made  sufficiently  fast  is  necessary.  The  need  for  transfer 
of  promising  technology  into  practical  use  is  again  a  critical  concern. 
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Figure  4-1.  Percentage  of  Errors  Discovered  by  Inspections  and  Branch  Testing  Using  the  Fortest  Sys¬ 
tem 

The  remainder  of  this  subsection  discusses  three  problems  which  have  impeded  the  use  and  developn 
ment  of  static  analysis  technology.  The  first  relates  to  the  programmatic  issue  of  requiring  use  of  available 
static  analysis  tec^ques.  The  second  addresses  the  lack  of  common,  formal  representation  forms  for 
early  life  cycle  products. 


4.5.1  Establishing  Static  Analysis  Policy 

The  earlier  parts  of  this  section  focused  on  the  more  advanced  types  of  static  analysis.  Looking  at  the 
entire  body  of  software  static  analysis  technology,  some  of  which  has  been  around  for  well  over  a  decade, 
there  are  many  available  techniques.  Since  most  of  these  are  fully  automated,  requiring  little  effort  on  the 
part  of  a  software  developer,  they  are  relatively  cheap  to  apply.  Although  manual  review  techniques  are 
not,  of  course,  automated,  sufficient  data  has  been  collected  to  demonstrate  their  usefulness. 
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There  should  be  a  well-established  policy  requiring  routine  use  of  a  core  set  of  these  techniques. 
Instead  of  a  collection  of  ad  hoc  tools,  this  core  set  (where  appropriate)  should  be  supported  by  a  well- 
designed  collection  of  commercially  developed,  cooperating  tools. 

4.5.2  Common  Set  of  Formal  Representations  for  Pre-Implementation  Products 

\^th  the  reception  of  various  simulation  and  modelii^  approaches,  static  analysis  is  the  main  vehicle 
for  examination  of  pre-implementation  products.  Unlike  the  dozen  or  so  commonly  used  programming 
languages,  there  is  a  great  number  of  representation  forms  for  early  life  cycle  products.  These  languages 
not  only  differ  in  syntax  and  semantics,  but  exploit  a  diverse  variety  of  underlying  conceptual  models. 
Many  embody  only  a  limited  degree  of  formalism  which  has  precluded  riotous  analysis  of  early  develop¬ 
ment  products.  Lack  of  a  widely-used,  common  set  of  languages  has  diffused  the  efforts  of  an  already 
small  research  community  and  impeded  the  development  of  a  substantial  body  of  static  analysis  tech¬ 
niques  (and  supporting  tools)  for  any  particular  language. 

The  need  for  increased  formalism  in  early  development  activities  has  long  been  recognized.  The  reason 
why  it  rarely  occurs  in  practice  lies  in  the  difficulty  of  using  formal  languages.  While  not  all  formalisms 
are,  or  need  be,  highly  mathematical,  the  few  existing  formal  langu^es  generally  employ  mathematical 
reasoning  which  is  beyond  the  educational  preparation  of  most  software  developers.  Although  these 
languages  could  be  designed  for  easier  use,  and  better  supported  by  automated  tools,  there  is  a  fine  line 
between  providing  easy-to-use  notations  and  loosing  the  benefits  of  strict  formalism. 

4.5.3  CompatibiUiy  of  Modular  Interfaces 

Although  some  work  addressing  the  compatibility  of  modular  interfaces  is  being  performed,  this  is  an 
area  deserving  much  more  attention.  In  particular,  compatibility  of  hardware/software  interfaces  needs  to 
be  addressed.  In  as  much  as  systems  design  will  emphasize  modularity  in  the  future,  and  formal 
specification  of  interfaces,  symbolic  execution,  testing,  or  formal  verification  methods  for  establishing 
interface  consistency  are  needed. 

4.6  Automated  Support  for  Dynanfic  and  Static  Analysis 

Increased  emphasis  must  be  placed  on  the  development  of  production  quality,  automated  support 
tools.  Automated  tools  have  long  been  an  essential  ingredient  for  effective  and  efficient  testing  and 
evaluation.  As  techniques  become  more  complex,  tools  which  automate  the  application  of  these  tech¬ 
niques  are  increasingly  indispensable.  While  extensive  automation  is  not  the  solution  to  all  testing  and 
evduation  problems,  it  will  significantly  reduce  the  traditionally  human-intensive  nature  of  testing  and 
evaluation.  The  provision  of  an  integrated  environment  to  support  the  use  of  state-of-the-art  analysis 
activities  will  be  a  key  element  in  their  success.  It  is  worthwhile  noting,  however,  that  as  more  sophisti¬ 
cated  toois  are  developed  to  apply  techniques  in  an  integrated  manner,  software  developers  will  require 
more  education  in  the  use  of  tools.  Guidance  will  be  necessary  to  ensure  the  proper  and  imaginative  use 
of  such  tools  in  each  type  of  circumstance. 

At  the  turn  of  the  decade,  the  majority  of  tools  were  those  that  provided  analysis  capabilities  for  For¬ 
tran,  Cobol,  and  PL/1  programs.  Several  reviews  of  available  tools  are  to  be  found  in  the  literature.  One 
of  the  most  recent  reviews  was  that  conducted  as  part  of  the  STEP  effort  [DeMi87a].  Over  recent  years, 
the  trend  towards  supportii^  analysis  of  these  older  languages  has  been  changing  and  the  majority  of  new 
prototype  tools  are  being  developed  to  support  the  analysis  of  Ada  programs.  As  research  vehicles,  these 
prototypes  are  not  designed  to  be  robust,  easy  to  use,  or  portable.  In  many  cases,  significant  effort  will  be 
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required  to  develop  production  quality  counterparts  that  are  suitable  for  widespread  use.  Yet  it  is  vital  that 
this  effort  be  expended  as  part  of  the  technology  transfer  activities  necessary  to  bring  new  technology  into 
everyday  practice. 

While  stand-alone  tools  ^plying  individual  techniques  were  very  much  the  norm  in  the  1970’s,  the 
recognized  need  for  multiple  techniques  cooperating  in  an  int^ated  strategy  is  promotii^  the  develop¬ 
ment  of  a  few  powerful  testing  and  evaluation  environments.  The  Mothra  environment  [DeMiSTb], 
developed  by  the  Georgia  Institute  of  Technology  and  Purdue  University,  is  one  of  the  first  such  environ¬ 
ments.  It  currently  provides  mutation  testing,  structural  testing,  and  a  form  of  functional  testing.  Sym¬ 
bolic  testing  is  eq[>ected  to  be  added  within  the  year.  The  existii^  system  supports  analysis  of  Fortran  pro¬ 
grams,  and  it  has  been  distributed  to  a  few  sites  for  evaluation  and  beta  testing.  Another  version  support¬ 
ing  analysis  of  Ada  programs  is  expected  to  become  available  in  the  near  future.  Since  mutation  testing  is 
computationally  intensive,  researchers  are  investigating  how  to  merge  large  numbers  of  program  mutants 
into  a  small  set  of  h^hly  vectorizable  programs  which  can  exploit  the  architecture  of  vector  processors, 
such  as  the  Cray  X-MP  or  the  Alliant  l6c/8,  for  efficient  execution  of  all  the  mutants  [MathSSa]. 

Researchers  at  the  University  of  Massachusetts  are  in  the  early  stages  of  developing  another  extensive 
Ada  testing  and  evaluation  environment  [ClarSSb].  Hieir  intention  is  to  support  all  the  major,  current 
analysis  approaches,  includu^  data  flow  testing,  EQUATE,  and  RELAY.  This  environment,  called  the 
Testing,  Evaluation,  and  Analysis  Medley  (TEAM),  is  designed  as  a  hierarchy  of  tools  that  is  both  flexible 
and  extensible.  Accordingly,  there  are  two  important  aspects  to  the  design.  First,  the  system  architecture 
is  designed  to  provide  layers  of  capabilities  that  support  more  advanced  analysis  techffiques.  Second,  as 
many  general  capabilities  as  possible  are  recognized.  One  tool  in  particular,  the  ARIES  generic  inter¬ 
preter,  will  be  central  to  many  other  tools.  ARIES  has  several  distinctive  features:  (1)  it  is  an  generic 
interpretation  system  which  is  instantiated  to  yield  a  customized  interpreter  for  use  in  a  particular  tool,  (2) 
it  is  multilingual  and  can  be  used  with  a  variety  of  procedural  languages,  and  (3)  it  is  a  multi-computational 
model  system  capable  of  supporting  both  conventional  models  of  execution  and  a  variety  of  symbolic  and 
data  flow  models.  It  is  expected  that  this  interpreter  will  provide  the  symbolic  interpretation  capabilities 
required  for  an  Ada  symbolic  evaluation  system. 

One  of  the  goals  of  the  TEAM  environment  is  to  allow  researchers  to  conduct  experimental  studies  of 
existing  analysis  approaches.  In  particular,  researchers  plan  to  investigate  the  integration  of  analysis  tech¬ 
niques,  analysis  support  for  pre-implementation  products,  and  incremental  analysis  of  potentially  incom¬ 
plete  products.  In  addition  to  the  emphasis  on  generic  and  language-independent  capabilities  mentioned 
above,  Clarke  dtes  several  addition^  requirements  for  TEAM  [ClarSSb],  These  include  effective  user 
interaction  models  providing  natural  interfaces  which  reduce  the  burden  on  the  user  and  ensure  interface 
uniformity  across  tools.  A  process  programming  approach  [Oste87a]  will  also  be  adopted  such  that 
TEAM  itself  will  prescribe  the  acceptable  uses  and  interactions  among  tools,  relieving  the  user  of 
unnecessary  responsibilities  and  facilitating  the  inclusion  of  new  capabilities. 

This  effort,  as  a  whole,  is  being  carried  out  in  the  context  of  the  Arcadia  software  development 
environment  [Tayi88].  The  initial  prototype  of  the  TEAM  environment  will  contain  basic  data  flow  and 
symbolic  evaluation  capabilities  and  should  be  running  by  the  end  of  1988. 

The  prototype  software  and  hardware  development  environment  being  built  by  researchers  at  Stanford 
University  [Luck86a]  has  a  wider  focus  than  the  efforts  so  far  discussed.  This  environment  is  based  on  the 
use  of  wide-spectrum  languages  which  provide  a  notation  for  describing  the  intended  behavior  of  a  system 
and  the  implementation  of  that  behavior.  These  languages  are  ANNA,  TSL,  and  the  VHDL  Annotation 
Language  (VAL)  [Luck86a].  Special  emphasis  is  placed  on  distributed  computing,  both  in  providing  tools 
for  handling  concurrency  in  the  subject  system,  and  in  designing  tools  that  utilize  concurrency  in  the 
environment  itself.  ANNA  and  TSL  can  be  used  to  develop  specifications  of  Ada  systems  that  are 
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susceptible  to,  respectively,  symbolic  execution  and  simulation.  Annotations  given  in  these  languages  can 
also  be  automatically  transformed  into  run-time  checks  which,  as  mentioned  in  Section  3.2,  provide  a 
form  of  self-checking  Ada  software.  The  prototype  mvironment  contains  most  of  the  tools  required  to 
support  these  functions,  in  various  stages  of  maturity.  In  addition  to  the  potential  power  of  the  testing  and 
evaluation  functions  supported,  this  environment  is  very  promising  in  its  ability  to  integrate  testing  and 
evaluation  into  system  development  activities. 

Looking  at  the  field  as  a  whole,  however,  it  is  clear  that  the  number  of  static  analysis  tools  is  much 
smaller  than  the  number  of  dynamic  analysis  tools.  In  view  of  the  complimentary  roles  and  benefits  of 
static  and  dynamic  approaches,  this  unbalance  should  be  rectified. 

It  is  also  important  to  emphasize  the  critical  need  for  flexible  environments.  Any  environment 
(whether  intended  to  support  development  activities,  testing  and  evaluation  activities,  or  both)  which  can¬ 
not  continue  to  integrate  the  increasing  numbers  and  types  of  tools  that  will  emerge  in  the  coming  years 
will  have  a  very  short-lived  usefulness.  The  necessary  flexibility  must  be  designed  into  an  environment. 
Although  this  requires  additional  upfront  plaiming  and  expendimres  in  the  environment  development 
process,  it  would  be  “penny-wise  and  pound-foolish”  to  follow  any  other  course. 
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5.  FORMAL  VERIFICATION  TECHNOLOGY 


This  section  reviews  the  state-of-the-art  in  formal  verification  technology.  After  a  brief  overview  cf 
what  formal  verification  attempts  to  achieve  and  what  benefits  it  offers,  the  current  status  of  the  technol¬ 
ogy  and  ongoing  research  and  development  are  reviewed.  Finally,  some  high-level  technology  issues  are 
raised  and  recommendations  are  discussed. 

5.1  Focus  and  Benefits  of  Formal  Verification 

The  objective  of  formal  verification  is  to  prove,  using  mathematical  logic,  that  systems,  in  theory,  will 
behave  according  to  their  specifications.  The  term  formal  refers  to  the  level  of  rigor  required  in  system 
specifications  and  the  construction  of  valid  proofs.  English  is  not  a  precise  enough  language  for  stating 
technical  specifications  or  arguing  that  systems  satisfy  their  requirements.  Mathematics  notations  with 
rigorous  definitions  provide  the  necessary  precision  and  allow  system  correctness  arguments  to  be 
checked  by  automated  tools.  The  clause  in  theory  is  included  to  recognize  that  proofs  apply  to  idealized 
models  of  systems,  not  to  the  actual  physicS  systems  themselves.  For  example,  verification  of  high-level 
language  software  assumes  the  correct  and  reliable  operation  of  compilers,  nm-time  support  systems,  and 
hardware. 

Several  ingredients  are  necessary  to  prove  properties  of  systems,  including: 

1.  Rigorous  definitions  or  specifications  of  the  behavior  and  performance  required 
to  achieve  each  property  —  typically  a  set  of  system  relationships  that  must  hold 
under  all  circumstances; 

2.  Rigorous  definitions  of  the  meaning  (that  is,  effects)  of  all  statements  and 
expressions  that  can  be  formulated  in  the  programming  language  used  —  also 
called  the  language’s  formal  semantics; 

3.  Rigorous  definitions  of  the  effects  of  each  machine  instruction  available  on  the 
target  hardware;  and 

4.  Sound  rules  of  logical  inference,  which  guarantee  that  only  true  assertions  can 
be  proved. 

Verification  and  testing  should  be  viewed  as  complementary  technologies.  Proofs  address  entire 
classes  of  possible  circumstances,  where  testing  exercises  only  a  relatively  small  number  of  actual  cases. 
Verification  is  useful,  therefore,  to  assiue  system  properties  that  are  impossible  to  test  adequately.  Secu¬ 
rity  m  operating  systems,  for  example,  can  be  assured  by  proofs  of  security  properties,  but  is  extremely 
difficult  to  assure  through  testing.  Verification  is  not  a  substitute  for  testing,  however,  because  tests  can 
be  applied  to  actual  physical  systems  as  well  as  to  idealized  models.  That  is,  testing  can  produce  counter 
examples  that  invalidate  assumptions  upon  which  proofs  are  based.  Proponents  of  form^  verification  are 
not  likely  to  volunteer  to  fly  in  aircraft  that  have  been  verified  but  not  tested. 

5.2  Status  of  Current  Technology 

This  section  briefly  surveys  curr'*nt  verification  techniques  in  three  areas:  software,  hardware,  and  sys¬ 
tems. 
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5.2.1  Tecbniqoes  for  Software 

The  principal  techniques  for  verifying  software  are  described  below,  starting  with  techniques  for  simple 
sequential  software.  These  techniques  apply  to  Software  des^ns  expressed  as  high-level  programs  as  well 
as  to  actual  program  code.  Programming  language  features  that  complicate  the  verification  process  are 
then  briefly  discussed,  followed  by  descriptions  of  additional  techniques  for  verifying  concurrent  and 
parallel  software. 

5.2. 1.1  Sequential  Software 

Techniques  for  proving  properties  of  programs  written  in  conventional  procedural  programming 
languages  are  the  oldest  and  most  well  known  to  software  and  system  developers.  These  techniques  were 
introduced  by  Floyd  [Floy67]  and  Hoare  [Hoar69],  and  have  been  refined  and  improved  by  numerous 
researchers,  including  Dijkstra  [Dijk76a]  and  Gries  [GrieSl]. 

The  general  approach  assumes  that  programs  start  in  an  initial  state  with  a  set  of  initial  conditions,  and 
must  complete  in  a  state  that  satisfies  a  set  of  required  final  conditions.  This  may  be  represented  symboli¬ 
cally  in  two  possible  forms.  The  first  is 

initial_state  {  progreua  }  final_state 

final_state  -+  required_final_conditions 

which  means:  starting  from  the  initial  state,  executing  the  program  will  result  in  the  final  state,  and  the 
final  state  satisfies  (implies)  the  required  final  conditions.  The  second  form  is 

initial_state  — ►  weakest_initial_conditions 

weakest_initial  conditions  {  progreun  )  required_final_conditions 

The  weakest  initial  conditions  are  the  minimal  conditions  under  which  executing  the  program  will 
always  result  in  satisfying  the  required  final  conditions.  This  second  form  is  preferred  for  developing  a 
program  and  its  proof  together  as  one  process. 

Programs  are  made  up  (typically)  of  sequences  of  statements.  Proof  rules  for  sequences  of  statements 
require  the  result  of  each  successive  statement  to  satisfy  the  weakest  preconditions  for  the  next  statement. 
Symbolically,  this  can  be  expressed  as 

wf»akest_initial_condition  (  statement_l  )  intermediate_result 
intermediate_result  — +  weakest_intennediate_precondition 
weakest_intermediate_precondition  {  statement_2  )  required_result 

Weakest  preconditions  are  typically  derived  in  reverse  order,  starting  with  the  last  statement  in  the  pro¬ 
gram.  Proving  that  a  program  satisfies  its  requirements  amounts  to  deriving  the  effects  of  each  statement 
and  proving  that  all  intermediate  results  imply  the  corresponding  intermediate  preconditions. 

Additional  rules  for  interpreting  the  effects  of  assignment  statements,  conditional  statements,  loops, 
and  procedure  calls  can  be  spelled  out  in  terms  of  the  results  they  produce.  For  example,  a  conditional 
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statement  such  as 

IF  test_conditlon  THEN 

then_statements 

ELSE 

else_statements 
END  IP; 

can  produce  two  possible  results:  one  produced  by  the  thenjstatements,  when  the  test  condition  is  true, 
and  one  produced  by  the  else^tatements,  when  the  test  condition  is  false.  Turning  this  around  to  look  for 
a  weakest  precondition  to  achieve  a  required  result  requires  solving  the  following  expressions: 

weakest_precondition 

AND  test_condition  [  then_statements  )  then_result 
wea]cest_precondition 

AND  NOT  test_condition  {  else_statements  }  else_result 
then_result  — »  required_result 
else_result  — ►  required_result 

That  is,  the  weakest  precondition  for  an  if-then-else  statement  is  the  most  general  condition  that  allows 
both  alternative  results  to  satisfy  the  required  result. 

Loop  statements  that  have  conditional  exits  operate  much  like  conditional  statements.  Loops,  how¬ 
ever,  require  the  derivation  of  another  condition  called  the  loop  invariant.  Consider,  for  example,  the 
loop  statement: 

WHILE  NOT  exit_condition  LOOP 
lo&p_body 
END  LOOP; 

If  the  exit  condition  is  false,  meaning  the  loop  body  will  be  executed,  there  are  conditions  that,  if  true 
before  the  loop  body  is  executed,  will  remain  true  afterwards.  The  loop  invariant  is  the  most  general  of 
these  conditions.  That  is, 

loop_invariant 

AND  NOT  exit_condition  (  loop_body  )  loop__body_result 
loop_body_result  — ►  loop_invariant 

The  weakest  precondition  for  a  loop  statement  as  a  whole  is  independent  of  its  exit  condition,  since  the 
exit  condition  may  be  either  true  or  ftilse  at  that  point.  The  loop  invariant  is  the  critical  precondition.  In 
fact,  the  invariant  is  the  key  to  a  loop’s  overall  behavior.  The  proof  rule  for  a  loop  statement  can  be  for¬ 
mulated  in  terms  of  the  invariant  as  follows: 

loop_invariant  [  loop_statement  )  loop_invariant  AND  exit_condition 

Loop  invariant  conditions  can  be  difficult  to  discover.  There  are  no  simple  rules  or  procedures  by 
which  they  can  be  automatically  derived. 
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All  that  remains  to  complete  the  proof  of  a  loop  statement  is  to  prove  that  the  loop  terminates.  This 
requires  demonstratu^  that  the  loop  body  moves  a  step  closer  to  making  the  exit  condition  true  on  each 
iteration.  An  inductive  argument  can  then  be  made  to  guarantee  that  the  loop  will  eventually  terminate. 

Function  and  procedure  subprograms  can  simplify  large  programs.  They  also  simplify  program 
verification.  A  proof  of  a  subprogram’s  beha^dor  can  be  derived  once  and  used  as  a  lemma  everywhere  the 
subprogram  is  called.  Function  calls  must  be  embedded  within  statements.  Any  input  restrictions 
imposed  on  parameters  or  global  variables  by  a  function,  therefore,  must  apply  to  the  containing  state¬ 
ment.  A  symbolic  expression  representing  the  value  returned  by  the  function  can  then  be  substituted  in 
the  proof  rules. 

Proof  rules  similar  to  those  described  above  for  loops  and  conditional  statements  can  be  formulated  for 
procedure  calls.  They  will  typically  include  input  restrictions  imposed  on  parameters  and  global  variables, 
and  will  define  the  resulting  conditions  that  will  hold  when  the  procedure  returns  control  to  the  calling 
program.  These  proof  rules  generally  have  the  form 

preconditions  — *  input_restrictions 

input_restrictions  {  procedure_Gall  }  resulting_conditions 

resulting_conditions  — ►  required_results 

where  input  parameter  and  global  variable  restrictions  are  applied  to  the  actual  argriments  and  tbe  current 
environment,  and  the  resulting  conditions  are  similarly  transferred  to  reflect  the  new  program  state  upon 
return. 

Recursive  functions  and  procedures  require  termination  arguments  similar  to  those  required  for  loops. 
It  must  be  shown  that  every  recursive  call  solves  a  simpler  version  of  the  original  problem  and  that,  at 
some  point,  a  solution  can  be  reached  directly,  without  further  recursion. 

5.2.1. 2  Complications 

This  bnef  tutorial  on  verification  techniques  oversimplifies  the  task.  There  are  numerous  complica¬ 
tions  in  proving  properties  of  real  system  designs  and  real  application  software.  The  first  complication  is 
that  proofs  are  more  detailed,  involve  more  steps,  and  are  more  tedious  than  the  designs  or  programs 
themselv<.s.  While  much  of  the  tedium  can  be  mitigated  by  automated  tools,  constructing  proofs  is  still  a 
demanding  task. 

Other  sources  of  complication  stem  from  language  characteristics.  In  Ada,  for  example, 

•  Side  effects  from  functions  can  change  the  state  of  a  program  in  the  middle  of 
evaluating  expressions, 

•  Exceptions  can  leave  results  of  computations  in  undefined  states,  and 

•  Aliasing  of  procedure  parameters  and  global  data  can  have  additional  side  effects. 

These  characteristics  make  rules  for  int  oreting  statements  more  complex  and  require  software 
developers  to  verify  many  more  intermediate  c  .  ditions  to  produce  complete  proofs.  One  of  the  reasons 
for  recommending  that  properties  of  software  be  proven  as  an  integral  part  of  the  development  process  is 
that  many  of  these  complications  can  be  avoided  by  restricting  use  of  these  language  features. 
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5.2. 1.3  Concurrent  Software 

The  introduction  of  concurrency  (multitasking)  in  programming  languages  adds  a  new  dimension  to  the 
verification  problem.  The  simple  model  of  a  single  sequence  of  statements  is  no  longer  adequate  to 
describe  computations.  Several  sequences  of  statements  may  be  active  at  one  time  and  their  execution 
may  overlap  and  interleave  in  arbitrary  ways,  making  it  virtually  impossible  to  isolate  a  program’s  state  at 
any  particular  point. 

Properties  of  concurrent  software  can  be  divided  into  two  major  classes  [Lamp83]: 

•  Liveness  properties  —  which  identify  program  behavior  that  must  be  achieved 
(e.g.,  that  tasks  respond  correctly  and  cooperate  to  satisfy  the  program’s  require¬ 
ments);  and 

•  Safety  properties  —  which  identify  inconsistent  or  imtenable  program  states  that 
must  never  arise. 

Liveness  properties  can  be  analyzed  using  process  modeling  techniques  such  as  Petri  nets  [Pete77]  or  mes¬ 
sage  passing  schemes  such  as  Communicating  Sequential  Processes  (CSP)  [Hoar85].  Tasl^g  models  usu¬ 
ally  assume  that  tasks  interact  only  in  well-defined,  controlled  ways;  namely,  when  they  synchronize  with 
each  other  or  exchange  messages.  This  allows  the  techniques  for  verifying  sequential  program  code  to  be 
applied  to  the  behavior  of  individual  tasks  between  synchronization  points.  Proofs  of  concurrent  pro¬ 
grams,  therefore,  attempt  to  show  that: 

1.  Modeled  tasks  satisfy  the  program’s  required  liveness  properties; 

2.  Program  tasks  and  synchronization  points  correspond  directly  with  modeled 
tasks; 

3.  Synchronization  points  are  the  only  places  where  tasks  interact;  and 

4.  Individual  tasks,  in  isolation,  satisfy  their  own  input-output  requirements. 

Safety  properties  are  invariant  conditions  similar  to  those  described  for  loops.  These  invariants,  how¬ 
ever,  are  global  to  the  program  and  must  hold  across  all  tasks.  An  example  safety  property  is  freedom 
from  deadlock.  The  invariant  condition  to  be  proven  is  that  no  tasks  will  ever  be  blocked  waiting  for  syn¬ 
chronization  with  each  other  in  such  a  way  that  none  of  them  can  ever  proceed. 

Temporal  logic  [Pnue77]  is  a  technique  used  to  reason  about  both  liveness  and  safety  properties  of  con¬ 
current  programs.  Temporal  logic  extends  the  predicate  calculus  with  expressions  that  indicate  time 
dependencies  such  as  henceforth  and  eventually.  For  example,  liveness  properties  are  usually  proven  by 
showing  that  a  program  progresses  from  one  stable,  global  state  to  another.  The  exact  sequence  of  indivi¬ 
dual  steps,  however,  cannot  be  determined  because  of  their  concurrent  execution.  A  temporal  logic 
specification  of  such  a  transition  is 

HENCEFORTH  initial_stable_state 

— ♦  EVENTUALLY  next_stable_state 

Stable  states  are  typically  task  synchronization  points,  including  task  initiation  and  termination. 

Temporal  logic  obeys  a  full  set  of  algebraic  laws  that  allow  formal  reasoning  about  the  behavior  of  con¬ 
current  programs.  For  example,  the  following  equivalence  relation  captures  the  concept  that  a  condition  P 
is  not  always  true  if  eventually  it  can  become  false. 
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NOT  (HENCEFORTH  condition_P)  -  EVENTUALLY  (NOT  condition_P ) 

5.2.1.4  ParaUel  Software 

Parallel  software  is  distinguished  as  a  special  case  of  concurrent  software  where  concurrent  operations 
are  performed  synchronously.  These  techniques  are  directed  toward  fine-grained  machine  parallelism 
rather  than  the  general  multi-tasking  models  of  conctirrency  previously  discussed. 

Recently,  Chandy  and  Misra  [Chan88]  introduced  a  new  unified  theory  of  parallel  computation.  Their 
approach  addresses  a  wide  range  of  granularity  in  concurrent  operations  and  applies  to  a  wide  range  of 
parallel  machine  architectures.  In  this  model,  program  execution  starts  from  an  initial  state  and  repeat¬ 
edly  selects  and  executes  assignment  statements  nondeterministically.  Each  assigmnent  statement  may  be 
guarded  by  a  conditional  expression  and  may  assign  values  to  multiple  variables  in  parallel.  The  only  con¬ 
straint  on  the  nondeterminism  is  that  every  assignment  must  be  selected  and  executed  infinitely  often. 
The  theory  is  simplified  by  not  including  conventional  program  control  fiow  in  the  model  and  by  assuming 
that  all  programs  run  forever.  In  practice,  programs  terminate  when  they  reach  a  fixed  point,  where  all 
open  assignment  statements  leave  the  program  state  unchanged. 

Although  this  model  is  very  simple,  it  is  fully  adequate  for  expressing  and  analyzing  useful  and  efficient 
parallel  computations.  One  advantage  of  a  simple  model  is  that  formal  semantics  and  proofs  of  properties 
can  be  greatly  simplified.  This  model  also  appears  to  support  methods  for  efficiently  mapping  programs 
onto  several  types  of  shared-memory  multiprocessor  machine  architectures. 

5.2.2  Techniques  for  Hardware 

Verification  of  high-level  language  software  assumes  the  correct  and  reliable  operation  of  compilers, 
run-time  support  systems,  and  hardware.  Proofs  of  properties  at  the  abstract  program  level  are  of  limited 
value  if  the  correctness  of  translation  and  execution  cannot  be  equally  assured.  One  approach  to  solving 
this  problem  is  to  define  a  series  of  abstract  machines,  each  of  which  can  be  emulated  by  the  next  machine 
using  a  relatively  simple  (that  is,  provable)  set  of  software  macros.  At  the  top  end  of  the  series  is  an 
abstract  machine  that  directly  executes  the  high-level  program.  At  the  bottom  end  is  a  machine  that  maps 
directly  to  the  target  hardware.  That  is,  there  is  a  one-to-one  correspondence  between  the  last  abstract 
machine  and  the  design  of  the  actual  physical  hardware.  This  technique  transcends  the  machine 
instruction-set  level  and  can  be  used  to  prove  program  properties  down  to  the  micro-code  and  hardware 
gate  level. 

5.2.3  Techniques  for  Systems 

Formal  verification  of  complete  hardware  and  software  systems  is  an  active  research  area.  Capabilities 
that  system  verification  will  require  include: 

•  Formal  system-level  specification  techniques  —  including  hardware  and  software 
performance  specification  and  analysis  techniques. 

•  Compatible  modeling  techniques  for  diverse  system  components  that,  for  exam¬ 
ple,  allow  proven  properties  of  hardware  components  to  be  incorporated  in 
proofs  of  software  components. 

•  System  construction  techniques  that  allow  composition  of  proven  components  to 
yield  provable  systems. 
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5.2.4  Automated  Support 

Automated  tools  for  verification  include  processors  for  formal  specification  and  program  annotation 
languages,  verification  condition  generators,  theorem  provers,  and  proof  checkers.  Specification  and 
annotation  languages  are  typically  variations  of  the  predicate  calculus.  Verification  condition  generators 
attempt  to  describe  symbolically  the  state  transitions  made  by  each  statement  in  a  program.  Proofs  are 
made  up  of  arguments  that  program  statements  achieve  specified  conditions.  Theorem  provers  attempt  to 
construct  proofs  automatically  using  artificial  intelligence  techniques.  Proof  checkers  are  simpler  systems 
that  can  verify  correct  proofs  created  manually  or  using  other  sources  of  automated  help. 

First-generation  verification  systems  that  have  been  developed  include  AFFIRM  [Gerh80,Suns77], 
FDM  [KemmSO],  Gypsy  [Good86a,Ambl76a],  HDM  [Robi79],  and  the  Stanford  Pascal  Verifier  [Luck77]. 
The  fist  four  of  these  systems  were  reviewed  and  evaluated  in  an  extensive  report  by  Kemmerer 
[Kemm86].  The  FDM  and  Gypsy  systems  have  been  approved  by  the  National  Computer  Security  Center 
(NCSC)  for  the  verification  of  systems  targeted  for  Ai  security  certification.  NCSC  has  also  stated  that 
the  Gypsy  system  will  be  used  in  all  verification  of  SDS  software.  These  systems  are  complicated,  how¬ 
ever,  and  it  is  not  easy  to  transfer  knowledge  gained  on  one  system  to  another. 

5.3  On-Going  Research  and  Development  Efforts 

This  section  discusses  current  research  and  development  activities  being  conducted  to  advance  the 
state-of-the-art  of  verification  theory  and  practice. 

5.3.1  Baric  Research 

Basic  research  is  needed  to  develop  understanding  of  several  fundamental  open  verification  problems. 
These  problems  are  wider  than  simple  gaps  in  the  current  technology  and  are  likely  to  take  some  time  to 
solve.  Work  is  currently  being  done  in  these  areas,  but  no  results  have  been  reported.  The  following  sec¬ 
tions  describe  current  and  needed  efforts  in  the  areas  of  real-time  systems,  distributed  systems,  and 
degraded  system  operation. 

5.3.1. 1  Real-Time  Systems 

The  critical  property  to  be  verified  in  most  real-time  systems  is  that  processes  meet  their  deadlines.  The 
answer  is  affected  by  the  algorithms  employed,  efficiency  of  object  code  generated  by  the  compiler, 
scheduling  policies  and  performance  of  the  run-time  system,  and  target  hardware  performance.  The  prob¬ 
lem,  therefore,  spans  the  entire  system  design,  which  is  what  makes  it  so  difficult. 

In  the  past  the  deadline  problem  has  been  addressed  by  worst-case  anedysis,  which  tends  to  produce 
overly  pessimistic  solutions.  That  is,  systems  are  overbuilt  for  normal  operations  to  ensure  that  they  can 
handle  the  extreme  worst-case  deadUne  situations.  Specific  techniques  employed  include:  algorithms  with 
fixed  execution  times,  assembly  language  or  hand-optimized  object  code,  fixed  priorities,  deterministic 
scheduling,  and  target  hardware  upgrades.  Each  of  these  techniques  simplifies  the  deadline  verification 
problem,  but  they  often  restrict  capabilities  that  could  be  supported  under  normal  (slack)  operating  con¬ 
ditions. 

Adaptive  algorithms  and  scheduling  techniques  that  adjust  to  system  workload  have  been  introduced  to 
gain  processing  capabilities  during  slack  periods.  For  example,  when  system  workload  is  light,  slower  but 
more  accurate  algorithms  can  be  used  and  useful  background  operations  can  be  perf'‘>rmed.  As  the 
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workload  picks  up,  background  operations  are  dropped  and  required  processes  can  switch  to  faster  algo¬ 
rithms.  In  addition,  as  a  process  nears  its  deadline,  it  may  try  to  increase  its  scheduling  priority  to  assure 
its  completion.  Formal  methods  are  needed  for  reasonix^  about  these  techniques  that  would  allow 
development  of  proo&  of  adaptive  program  behavior. 

5.3.1.2  Distributed  Systems 

Distributed  systems  are  characterized  by  communication  latencies  between  subsystems.  This  makes  it 
extremely  difficult  for  subsystems  to  sync^onize  their  actions.  Timing  constraints  on  coordination,  for 
example,  may  not  allow  subsystems  to  fully  verify  each  other’s  actions  or  readiness.  That  is,  they  may 
have  to  proceed  on  the  assumption  that  the  other  subsystems  are  performing  their  functions  at  the  right 
time.  Formal  verification  of  such  systems  requires  methods  for  reasoning  about  system  behavior  that  do 
not  require  a  representation  of  the  system’s  global  state,  which  cannot  be  known  because  of  the  uncer¬ 
tainty  of  the  timing  of  individual  local  transitions. 

5.3.1.3  Degraded  System  Operation 

Fault-tolerant  systems  are  able  to  recover  from  or  adapt  to  certain  types  of  component  failures.  Many 
such  systems  may  continue  to  operate  in  a  degraded  mode  until  the  failed  component  can  be  repaired  or 
replaced.  Current  verification  techniques  assume  correct  operation  of  underlying  hardware  and  peri¬ 
pheral  devices  such  as  sensors.  Methods  for  reasoning  about  system  behavior  in  the  presence  of  potential 
component  failures  is  needed  to  verify  fault-tolerant  systems. 

5.3.2  Applied  Research  and  Development 

Applied  research  and  development  addresses  technology  gaps  that  do  not  require  significant  break¬ 
throughs  in  fundamental  understanding.  In  most  cases  these  problems  can  be  solved  by  creative  applica¬ 
tion  of  known  engineering  techniques  and  careful  implementation.  The  primary  activity  in  this  category  is 
development  of  production-quality  automated  tools. 

A  second  generation  of  verification  tools  is  currently  in  development.  These  tools  are  intended,  pri¬ 
marily,  to  improve  the  utility  of  earlier  tools  by  assuring  the  soundness  of  the  underlying  logic  system, 
standardizing  on  programming  and  annotation  languages  (for  example,  Ada  and  ANNA),  improving  user 
interfaces,  and  improving  performance.  Examples  of  such  efforts  include  the  Annotated  Verifiable  Ada 
(AVA)  system  being  developed  by  Computational  Logic,  Inc.  and  the  Penelope  system  being  developed 
by  Odyssey  Research  Associates. 

5.4  Application  Issues 

This  section  discusses  three  important  issues  relating  to  the  application  of  verification  technology: 

•  Identification  of  critical  properties  and  components  within  a  system  that  will 
require  verification, 

•  Education  and  training  in  verification  techniques,  and 

•  Insertion  of  verification  technology  into  software  and  system  development 
processes. 
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5.4.1  Identiflcation  of  Critical  Properties  and  Components 

5.4.1.1  Critical  Properties 

Critical  properties  of  a  system  as  a  whole  must  be  identified  as  early  as  possible  in  the  development  pro¬ 
cess.  These  properties  affect  critical  design  decisions  in  partitioning  and  allocating  functional  responsibili¬ 
ties  within  a  system.  Properties  that  must  be  proven  at  the  system  level  imply  requirements  for  com¬ 
ponents  with  proven  properties  and  construction  techniques  that  preserve  those  properties. 

5.4.1.2  Critical  Components 

Critical  system  components  that  will  require  verification  also  need  to  be  identified  as  early  as  possible 
in  the  development  process.  Proving  properties  of  components  is  much  easier  when  the  proof  process  is 
made  an  integral  part  of  their  design  and  implementation.  In  fact|  proofs  of  correct  components  may  be 
impossible  to  construct  after  the  fact,  because  of  design  decisions  and  programming  practices  that 
increase  the  complexity  of  proofs.  Early  identification  of  critical  components  can  significantly  reduce  the 
needed  verification  effort. 

5.4.1.3  Levels  of  Criticality 

As  a  corollary  to  identifyiag  critical  properties  and  components,  identification  of  levels  of  criticality 
would  help  in  assessing  verification  requirements  and  allocating  assurance  resources  between  testing  and 
verification. 

5.4.2  Education  and  Training 

Formal  verification  requires  high  levels  of  skill  and  maturity  in  logic  and  abstract  mathematics.  This  is 
not  likely  to  change  even  with  high-quality  automated  tools.  Tools  make  programmers  more  productive  by 
handling  the  tedious  details  of  proofs,  but  generating  proofs  will  still  require  abstract  andytical  skills, 
understanding  of  proof  techniques,  and  considerable  mathematical  sophistication. 

These  skills  are  not  commonly  taught  in  computer  science  courses  today.  Instead,  students  must  take 
theoretical  mathematics  courses,  which  may  be  (or  seem)  quite  unrelated  to  verification  applications.  This 
arrangement  does  not  produce  enough  graduates  with  sufficient  mathematical  skills  to  enable  the  industry 
to  verify  software,  hardware,  and  systems  on  a  regular  basis.  Changing  computer  science  and  related 
engineering  curricula  to  include  foundations  for,  and  direct  applications  of,  formal  verification  could 
alleviate  this  shortage. 

5.4.3  Technology  Insertion 

IWo  factors  that  would  facilitate  the  application  of  verification  techniques  within  the  industry  are;  (1) 
access  to  production-quality  verification  tools,  and  (2)  publication  of  worked  examples  of  formal  software 
and  system  specifications  and  proofs.  Earlier  versions  of  verification  tools  were  primarily  research  vehi¬ 
cles  built  for  experimentation  in  academic  environments.  Newer  versions  should  be  much  more  robust, 
easier  to  use,  and  easier  to  move  from  one  machine  to  another.  These  tools  can  therefore  be  made  avail¬ 
able  to  a  much  wider  community  of  potential  users.  Anyone  who  might  consider  employing  formal 
verification  techniques  should  never  be  discouraged  by  the  lack  of  available  tools. 
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Most  people  leam  by  followii^  examples.  Small,  textbook  examples  are  useful  for  learning  the  basic 
concepts  of  formal  verification.  Larger  examples  are  necessary  to  demonstrate  how  these  concepts  scale 
up  to  verifying  fiill-fle<^ed  systems.  Case  stupes  of  full-scale  verification  projects  are  needed  to  provide 
models  of  how  the  technology  can  be  applied  and  how  the  efforts  should  be  managed. 
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6.  SOFTWARE  MEASUREMENT  TECHNOLOGY 


This  section  of  the  report  discusses  the  field  of  software  measurement.  An  introduction  to  the  various 
types  of  software  metrics  is  ^ven,  along  with  a  discussion  on  some  of  the  early  and  existii^  measurement 
^orts.  Finally,  a  discussion  of  future  directions  in  the  field  of  software  measurement  is  presented. 

6.1  Introduction 

Software  measurement  deals  with  the  understanding,  characterizing,  evaluating,  predicting,  and  control¬ 
ling  of  software  products  and  processes.  This  field  has  traditionally  been  associated  with  the  application 
of  metrics  to  software  products  and  processes.  The  majority  of  these  software  metrics,  when  used  in  a 
context-free  fashion,  have  not  yielded  results  which  are  demonstrably  useful  to  either  software  developers 
or  managers.  Consequently,  current  efforts  are  directed  at  providing  a  framework  within  the  software 
development  life  cycle  which  will  facilitate  better  understanding  of  the  metrics  values  and  aid  in  creating 
higher  qualiiy  software  products. 

6.2  Types  of  Metrics 

Software  metrics  are  typically  classified  as  either  process  metrics  or  product  metrics.  Process  metrics 
are  measures  which  quantify  attributes  of  the  development  process  and  of  the  development  environment. 
Product  metrics  are  measures  of  various  characteristics  of  a  software  product.  A  general  discussion  on 
process  and  product  metrics  can  be  found  in  [Cont86]. 

Examples  of  process  metrics  include  the  education  level  of  programmers,  the  degree  of  automated  tool 
support,  and  number  of  design  walkthrot^s.  These  metrics  have  the  potential  to  provide  feedback  dur¬ 
ing  the  software  development  process  and  help  mant^ers  to  predict  or  monitor  the  progress  and  the  utili¬ 
zation  of  resources  within  a  project.  While  some  studies  have  shown  significant  statistical  correlation 
between  the  metrics  and  measured  quantities  relating  to  cost  or  fault  rates,  these  studies  often  hold  true 
in  a  only  a  particular  environment.  Little  research  has  been  performed  on  metrics  which  can  be 
effectively  applied  across  environments.  In  addition,  the  estimation  models  have  usually  given  limited 
consideration  to  the  underlying  software  development  paradigm  (for  example,  waterfall  versus  prototyp¬ 
ing)- 

Product  metrics  can  be  classified  as  external  or  internal  product  metrics.  External  product  metrics  are 
those  that  rely  on  data  collected  during  testing  and  actual  use  of  a  software  product.  TTiey  include  perfor¬ 
mance  metrics,  maintainability  metrics  (as  measured  through  cost  of  maintenance),  and  testability 
metrics  (as  measured  through  the  cost  of  testing  and  the  number  of  post-release  errors).  They  also  include 
reliability  metrics  (as  measured  through  fault  rates);  these  reliability  metrics  are  discussed  in  more  detml 
in  Section  7.  These  metrics  are  all  direct  measures  of  quantitative  attributes  and,  as  such,  are  excellent 
indicators  to  the  extent  that  the  test  data  set  used  to  derive  the  metrics  matches  the  data  actually  encoun¬ 
tered  in  operation. 

Internal  product  metrics  are  metrics  which  rely  on  examination  and  static  analysis  of  a  software  product. 
The  goal  of  these  types  of  metrics  is  to  provide  an  indirect  measure  of  the  same  attributes  measured  by 
external  product  metrics,  but  in  a  more  cost  effective  way.  Internal  product  metrics  can  be  collected  much 
earlier  in  the  software  development  process  than  external  product  metrics,  thus  providing  improved  feed¬ 
back  that  can  guide  the  project.  Examples  of  high-level  metrics  include  complexity,  portability,  correct¬ 
ness,  and  'n»»nt»«nability.  These  are  usually  assessed  through  examination  of  easily-measured  low-level 
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metrics  such  as  the  maximum  level  of  nesting,  the  use  of  user-defined  data  types,  and  the  number  of  lines 
of  code  in  a  program.  Internal  product  metrics  are  the  most  controversial.  It  is  generally  agreed  that  sin¬ 
gle  low-level  metrics  (for  example,  lines  of  code)  do  not  provide  enoi^  information  to  derive  a  high  level 
indicator  (for  example,  level  of  effort  required,  number  of  faults,  or  complexity)  [KafuSSa].  Moreover,  it 
has  not  been  conclusively  demonstrated,  through  statistically  significant  experiments  operating  from  a 
sound  theoretical  basis,  that  the  low-level  attributes  actually  relate  to  the  high-level  attributes  of  concern, 
and  there  is  a  lack  of  empirical  data  for  assessing  the  value  of  using  sets  of  such  metrics. 

Empirical  validation  of  these  metrics  have  been  hampered  for  three  main  reasons.  First,  the  data  used 
to  derive  a  researcher’s  metric  is  commonly  artificially  obtained  from  a  controlled  experiment.  Often, 
data  is  gathered  from  the  results  of  studying  software  developed  during  a  programming  course(s),  or  col¬ 
lected  from  small,  non-typical  programs  found  in  industry.  While  controlled  metric  experiments  are  use¬ 
ful  for  exploring  new,  unknown  phenomena  and  may  lead  to  statistically  significant  results,  they  often  do 
not  scale  up  in  real-world  case  studies  [Basi86a].  Second,  the  statistics  generated  from  a  metric  are  some¬ 
times  questionable.  Little  data  is  used  in  the  development  of  the  metrics,  therefore  its  relevance  is  not 
valid  in  the  general  case.  Finally,  it  is  often  not  clear  what  a  particular  metric  is  measuring.  Metrics  are 
described  in  general  terms  (for  example,  program  complexity)  which  leave  the  object  of  measurement 
unclear. 

Another  problem  associated  with  the  use  of  metrics  is  the  feasibility  of  artificially  manipulating 
software  so  that  desired  metric  values  are  achieved  while  not  changing  the  functional  characteristics  in 
any  beneficial  way.  Any  measureable  criterion  serves  as  a  motivator  for  project  personnel,  thus  the  use  of 
context-free  metrics  must  be  closely  scrutinized.  While  artificial  manipulation  is  more  difficult  when 
several  sets  of  metrics  are  specified,  it  can  lead  to  incidents  where  increasing  the  value  of  measured  attri¬ 
butes  becomes  the  goal  of  the  development  effort,  rather  than  increasing  the  inherent  quality  of  the 
software. 

Internal  product  metrics  are  often  used  to  provide  input  into  various  software  cost  estimation  models 
[Boeh84a].  These  models  analyze  various  software  quality  factors  to  determine  the  level  of  effort  required 
for  software  development,  and  provide  a  means  for  resource  estimation  and  allocation. 

Until  these  metrics  are  better  understood,  they  must  be  used  with  caution.  This  is  not  to  say  they  are 
without  value;  indeed,  metrics  can  be  often  useM  in  indicating  software  which  may  require  closer  scru¬ 
tiny.  Low-level  metrics,  in  particular,  should  be  looked  upon  as  indicators  of  desirable  and  undesirable 
software  characteristics,  and  not  used  for  ascertaining  the  intrinsic  worth  of  software. 

6.3  Early  Metrics  Research  Efforts 

Much  of  the  early  work  in  software  measurement  dealt  with  software  complexity  metrics.  These  types  of 
metrics  sought  to  provide  quantitative  estimates  of  program  complexity  by  measuring  a  variety  of  software 
attributes.  It  is  commonly  accepted  that  complex  programs  are  more  difficult  to  understand,  maintain, 
and  modify,  than  simple  programs.  One  of  the  earliest  and  most  simple  measures  for  assessing  software 
complexity  is  the  Lines  Of  Code  (LOG)  metric.  This  measure  derives  from  when  software  was  entered  on 
punched  cards.  Each  card  typically  represented  a  line  of  code  and,  as  such,  it  was  easy  to  compare  and 
contrast  the  size  of  the  physical  card  decks.  While  punched  cards  are  no  longer  a  common  media  for 
software,  the  term  remains  as  the  most  simple  measure  of  program  size.  Program  size  is  correlated  to 
software  complexity  in  that,  usually,  when  size  increases,  so  does  complexity.  The  LOC  measure  is  one  of 
the  easiest  complexity  measures  to  derive  and  is  usually  used  as  a  benchmark  for  comparing  other  com¬ 
plexity  metrics. 
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In  1976,  McCabe  introduced  a  graph-theoretic  complexity  measure  which  provided  initial  insight  into 
the  management  and  cor  xol  of  program  complexity  [McCa76].  McCabe’s  metric,  named  the  Cyclomatic 
Complexity  Metric,  is  based  on  graphs  representing  the  control  flow  of  a  program.  Programs  with  higher 
numbers  of  basic  control  paths  are  deemed  more  difficult  to  understand,  modify,  and  maintain  and,  there¬ 
fore,  generate  a  higher  cyclomatic  number.  The  cyclomatic  number  is  used  as  an  indicator  of  those 
modules  that  may  contain  code  which  will  be  difficult  to  test  and  maintain.  While  McCabe’s  complexity 
metric  is  being  used  in  industry  as  a  useful  indicator  of  potential  software  problems,  it  has  not  been  shown 
to  provide  a  better  estimate  of  program  complexity  than  the  LOC  measure  |Hame82,Evan83a]. 

In  1977,  Halstead  published  a  monograph  entitled  “Elements  of  Software  Science’’  [Hals77a].  Halstead 
claimed  that  the  metrics  of  Software  Science  were  firmly  based  on  the  methods  and  principles  of  classical 
experimental  science,  and  that  the  measuring  process  could  be  reduced  to  a  few  mathematical  equations. 
Immediately  after  its  publication,  a  considerable  debate  arose  over  whether  or  not  the  Software  Science 
metrics  represented  software  in  general,  or  only  responded  to  a  limited  class  of  software 
[Albr83,B^e79,Curt79a,Shen83,Zweb79].  However,  in  recent  years  Halstead’s  work  has  been  shown  to 
have  serious  theoretical  flaws  which  render  Software  Science  equivalent  to  simple  LOC  metrics 
[Card87a]. 

6.4  Early  Measurement  Research  Efforts 

This  section  of  the  report  discusses  two  major  research  efforts  which  occurred  early  in  the  software 
measurement  field.  The  National  Aeronautics  and  Space  Administration  (NASA)  Software  Engineering 
Laboratory  (SEL)  created  an  organization  which  has  provided  a  unique  ability  to  study  the  implementa- 
tioo  and  results  of  a  large  data  gathering  effort.  The  Rome  Air  Defense  Center  (RADC)  has  funded  a 
variety  of  software  measuremem  research  efforts  which  have  contributed  to  the  understanding  of  software 
metrics. 

6.4.1  Software  En^eering  Laboratory 

The  SEL  is  a  joint  venture  between  the  NASA/Goddard  Space  Flight  Center,  the  University  of  Mary¬ 
land,  and  the  Computer  Sciences  Corporation.  One  of  the  goals  of  the  SEL  has  been  to  improve  under¬ 
standing  of  the  impact  that  metric  usage  has  on  productivity  and  the  quality  of  software  products 
[fiasi85a].  It  has  identified  various  metrics  that  are  useful  for  evaluating  and  predicting  the  complexity, 
quality,  and  cost  of  Ada  programs  [BasiSSa].  In  addition,  extensive  data  collection  techniques  have  been 
developed  which  provide  the  basis  for  further  metric  research.  The  SEL  utilizes  NASA-developed, 
operational  software  to  provide  the  basis  for  empirical  investigation  into  various  aspects  of  metrics 
research  and  validation. 

6.4.2  Rome  Air  Development  Center  Software  Quality  Work 

The  RADC  has  been  involved  in  a  long-term  program  to  improve  and  control  software  quality.  One  of 
the  key  goals  of  this  effort  has  been  an  attempt  to  identify  the  major  issues  of  software  quality  and  provide 
a  well-defined  process  whereby  the  software  quality  of  Air  Force  weapon  systems  can  be  better  specified 
and  measured.  R ADC’s  initial  work  was  to  identify  and  define  a  set  of  software  quality  factors  that  are 
relevant  throughout  the  software  development  lifecycle  (McCa77].  Table  6-1  reproduces  these  quality  fac¬ 
tors,  together  with  the  primary  user  concern  that  each  factor  is  perceived  as  representing.  The  various 
user  concerns  are  grouped  into  three  acquisition  concerns,  representing  how  well  a  product  performs, 
how  well  it  is  designed,  and  how  adaptable  it  is. 
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User  Concern 

Quality  Factor 

How  well  does  it  utilise  a  resource? 

Efficiency 

How  secure  is  it? 

Integrity 

Performance  -  How  well  doe*  it  function? 

What  confidence  can  be  placed  in  what  it  does? 

Reliability 

How  well  will  it  perform  under  adverse  conditions? 

Survivability 

How  easy  is  it  to  use? 

Usability 

How  well  does  it  conform  to  the  requirements? 

Correctness 

Design  -  How  valid  is  the  design? 

How  easy  is  it  to  repair? 

Maintainability 

How  easy  is  it  to  verify  its  performance? 

Verifiability 

How  easy  is  it  to  expand  or  upgrade  its  capability  or  performance? 

Expandability 

How  easy  is  it  to  change? 

Flexibility 

Adaptation  *  How  Adaptable  is  it? 

How  easy  is  it  to  interface  with  another  system? 

Interoperability 

How  easy  is  it  to  transport? 

Portability 

How  easy  is  it  to  convert  for  use  in  another  application? 

Reusability 

Table  6-1.  RADC  Quality  Concerns 


Figure  6-2  shows  how  various  quality  factors  are  decomposed  into  a  hierarchical  software  quality  meas¬ 
urement  framework  where  each  factor  is  broken  down  into  several  criteria.  Each  criterion  is  then  further 
subdivided  into  a  set  of  metrics.  The  13  quality  factors  are  composed  of  29  criteria,  while  73  metrics  have 
been  defined  consisting  of  over  300  lower-level  metrics.  It  is  the  combination  of  these  lower-level  metrics 
which  ultimately  generates  a  high-level  software  quality  factor.  Although  there  is  little  empirical  evidence 
to  validate  these  correlations,  general  relationships  have  been  validated  (for  example,  low  coupling 
between  modules  seems  to  produce  more  maintainable  software).  Currently,  these  metrics  and  factors 
serve  only  as  guides  to,  or  simple  indicators  of,  a  program’s  quality. 

In  1978,  RADC  and  the  US  Army  Computer  Systems  Command  sponsored  enhancements  to  the  initial 
metrics  framework  [McCaSO].  These  enhancements  provide  a  project  manager  with  a  description  of  those 
quality  factors  typically  considered  to  be  the  most  important.  An  Automated  Quality  Measurement  Tool 
has  been  developed  to  automate  the  collection  of  specific  metric  data  and  to  provide  various  quality  meas¬ 
urement  results. 

In  1979,  RADC  sponsored  research  into  software  quality  issues  regarding  software  reusability  and 
interoperability  [RADC83a].  Metrics  for  assessing  the  quality  of  networked  computers  and  distributed 
systems  have  also  been  developed  [RADC83b]. 

6.5  Existing  Automated  Support 


Although  much  metric  research  is  language-independent,  implementation  of  automated  product 
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Flgore  6«1.  Software  Quality  Model 

metrics  usually  requires  tools  to  be  geared  to  a  specific  language.  While  there  are  many  metrics  tools  avail¬ 
able  for  the  a^ysis  of  software  written  in  languages  such  as  FORTRAN  and  COBOL,  tools  tailored  for 
the  Ada  language  are  only  beginning  to  emerge.  However,  since  much  of  the  DOD-sponsored  metric 
research  is  currently  focusing  on  automated  support  for  the  Ada  language,  increasing  numbers  of  Ada- 
oriented  tools  can  be  expected  over  the  next  several  years. 

The  remainder  of  this  subsection  outlines  the  current,  major  Ada-oriented  metric  tools/efforts  . 


6.5.1  AdaMAT 

Dynamics  Research  Corporation  has  developed  an  Ada  Measurement  and  Analysis  Tool  (AdaMAT)  to 
provide  automated  metric  analysis  of  Ada  software  [Perk86].  AdaMAT  supports  a  metrics  framework 
which  measures  six  software  criteria  (anomaly  management,  independence,  modularity,  self¬ 
descriptiveness,  simplicity,  and  system  clarity)  supported  by  150  software  metric  elements.  It  consists  of 
three  separate  tools:  (1)  a  data  collection  tool  for  static  analysis  of  the  Ada  software,  (2)  a  quality  analysis 
component  providing  an  interactive  analysis  of  the  code  and  isolation  of  problem  or  unusual  code,  and  (3) 
a  report  generator  which  collects  the  results  of  the  completed  analysis.  AdaMAT  metrics  are  arrayed  in  a 
hierarchy  based  upon  the  RADC  metrics  framework  [Cava78]  with  tailorings  for  the  Ada  language.  In  this 
framework,  the  lowest  level  metrics  are  data  items,  which  pr^uce  information  on  such  concerns  as  max¬ 
imum  level  of  nesting  and  number  of  “out"  parameters  in  a  procedure.  At  the  next  level  up,  metric- 
elements  use  the  data  items  to  provide  information  such  as  local  types  referenced  and  local  levels  of  nest¬ 
ing.  Metric-elements  are  then  used  to  create  software  quality  sub^riteria  which  in  turn  provide  informa¬ 
tion  on  such  concerns  as  flow  simplicity  and  error  prevention.  Finally,  at  the  highest  level,  software  quality 
criteria  provide  information  on  general  aspects  of  software  qualities,  such  as  modularity,  simplicity,  and 
anomaly  management. 
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6.S.2  ATVS 

Although  not  yet  completed,  General  Research  Corporation,  under  contract  to  RADC,  is  developing 
an  Ada  Test  and  Verification  System  (ATVS)  [RADC86].  ATVS  is  a  test  and  measurement  tool  which 
provides  static  and  dynamic  analysis  of  Ada  source  code  in  addition  to  the  collection  of  software  quality 
measurement  data.  Static  analysis  includes  call  dependency,  task  termination  dependency,  and  potential 
circular  deadlock  detection.  I^namic  analysis  is  achieved  by  instrumentii^  the  source  code  with  probes, 
and  yields  information  about  test  coverage,  timing,  and  tasking  activity  analysis.  In  addition,  AIVS  will 
provide  for  the  translation  of  manually  entered  assertions  into  executable  code.  Software  quality  measure¬ 
ment  data  is  collected  during  testing  activities.  This  data  is  then  made  available  to  both  the  user  and  other 
tools  which  are  planned  to  be  integrated  into  a  software  development  environment  RADC  is  construct¬ 
ing. 

fi.5.3  Software  Metrics  Data  Collection  (SMDQ 

Developed  at  Purdue  University,  the  Software  Metrics  Data  Collection  (SMDC)  system  [Yu88a]  pro¬ 
vides  a  comprehensive  repository  of  data  which  can  function  as  a  testbed  for  the  detailed  analysis  of  infor¬ 
mation  related  to  software  development.  SMDC  is  an  APL-based  system  which  runs  on  a  UNIX*  4.3 
BSD  enviromnent.  It  provides  an  extensive  facility  for  the  mathematical  and  statistical  manipulation  of 
data  collected  during  software  development  with  a  view  towards  metric  analysis.  The  data  currently 
resident  within  SMDC  has  been  acquired  from  the  public  domain,  industry,  academic,  military,  and  other 
sources.  Metrics  such  as  development  effort,  duration.  Software  Science,  Cyclomatic  Complexity,  LOC, 
and  others  are  collected  and  stored  in  the  SMDC. 

6.5.4  The  NOSC  Tools 

In  1983,  the  Naval  Ocean  Systems  Center  (NOSC),  under  contract  to  the  World  Wide  Military  Com¬ 
mand  and  Control  System  (WWMCCS)  Information  System  (WIS),  contracted  for  a  wide  selection  of 
software  tools  to  be  written  in,  and  for,  the  Ada  language.  This  software,  collectively  known  as  the  NOSC 
tools,  includes  automated  support  for  such  tasks  as  database  management,  graphical  interfacing,  text  pro¬ 
cessing,  project  man^ement,  and  metric  analysis.  One  of  the  metrics  tools  provides  an  implementation 
of  the  Software  Science,  Cyclomatic  Complexity,  and  LOC  complexity  measures  specifically  tailored  to 
the  Ada  language.  The  NOSC  software  resides  in  the  public  domain. 

6.6  Future  Directions  in  Measurement  Technology 

There  are  several  areas  which  need  to  be  improved  if  effective  software  measurement  is  to  be  achieved. 
A  solid  measurement  methodology  must  be  developed  which  will  provide  a  framework  from  which 
appropriate  metrics  can  be  selected  for  a  project  and  data  collection  and  validation  can  be  facilitated. 
Methods  which  provide  better  feedback  of  the  results  of  metric  analysis  into  software  development  activi¬ 
ties  are  needed.  In  addition,  there  is  a  need  for  extensive  automation  of  various  tools  and  techniques  to 
support  the  entire  measurement  process  and  its  integration  into  software  development. 
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6.6.1  Measurement  Methodology 

Metrics  have  traditionally  been  applied  to  software  products  and  processes  in  a  somewhat  bottom-up, 
stand-alone  fashion.  The  measurement  strategy  typically  revolves  around  a  collection  of  independent 
metrics  vdiich  are  applied  to  the  software  undergoing  analysis.  Measurement  begins  when  coding  nears 
completion,  and  ends  when  acceptable  levels  of  desirable  attributes  are  attained. 

It  is  generally  recognized  that  the  current  application  of  stand-alone,  evaluative  measures  of  software 
development  products  or  processes  does  not  yield  results  that  can  be  effectively  interpreted,  compared, 
or  validated.  A  methodology  is  required  to  integrate  the  diverse  aspects  of  establishing  measurement 
requirements,  guiding  the  selection  of  appropriate  metrics,  and  supporting  the  collection,  interpretation, 
and  validation  of  results. 

Basil!  and  Rombach  have  proposed  a  software  engineering  process  model  which  seeks  to  achieve  this 
objective  [Basi88a].  The  model  is  based  on  two  paradigms:  the  Quality  Improvement  paradigm,  which 
provides  a  guide  to  improving  the  software  development  process,  and  the  Goal/Question/Metric 
(G/Q/M)  paradigm,  which  guides  the  selection  of  appropriate  measures. 

The  Quality  Improvement  paradigm  proposes  a  sequence  of  six  steps  which  guide  activities  necessary  to 
better  understand  and  improve  the  software  construction  process.  In  the  first  step,  the  current  project 
environment  is  characterized.  This  step  attempts  to  identify  the  various  factors  wMch  will  influence  the 
project  development  (for  example,  problem  domain,  personnel  factors,  product  factors,  and  available 
resources).  Second,  goals  for  a  successful  project  development  are  set.  Example  goals  are  improvement 
of  the  quality  of  the  product,  a  reduction  in  production  costs,  and  achievement  of  a  stated  software  relia¬ 
bility  threshold  for  a  product.  Third,  the  appropriate  methods  and  tools  for  the  project  are  chosen  with 
the  objective  of  maximizing  project  goals.  Next,  the  software  is  developed,  and  data  related  to  the  goals, 
methods,  and  tools  of  the  project  are  collected.  Data  can  be  gathered  from  forms,  interviews,  and 
automated  tools.  The  Quality  Improvement  paradigm  does  not  specify  what  data  to  collect  or  how  the 
data  is  to  be  collected,  but  only  provides  a  basic  framework  so  that  each  specific  step  can  be  detailed  (and 
perhaps  automated)  by  the  organization.  The  next  step  is  a  post  mortem  study  of  the  gathered  data  in 
order  to  evaluate  current  practices  including  both  the  development  and  measurement  processes  and  tools, 
determine  problems,  and  make  recommendations  for  the  improvement  of  these  practices  in  future  pro¬ 
jects.  The  final  step  simply  requires  that  the  organization  does  actually  build  upon  and  exploit  the 
knowledge  gained  from  this  data  collection  and  analysis  in  subsequent  projects. 

The  G/Q/M  paradigm  provides  (1)  an  operational  formulation  of  the  second  step  of  the  Quality  Improve¬ 
ment  paradigm  and  (2)  the  glue  to  tie  together  all  the  steps  of  the  Quality  Improvement  paradigm.  Here, 
an  approach  is  specified  for  determining  and  specifying  the  goals  of  a  software  development  project. 
These  goals  are  then  refined  into  a  set  of  quantifiable  questions  which  provide  the  basis  for  determining 
the  appropriate  software  metrics  and  the  data  to  be  collected.  Automated  templates  and  guidelines  are 
provided  to  assist  in  the  derivation  of  these  goals,  questions,  and  metrics. 

The  G/Q/M  paradigm  is  innovative  in  that  the  derived  metrics  depend  heavily  upon  the  goals  and 
characteristics  of  the  specific  project  or  organization.  It  recognizes  that  the  goals  of  most  projects  are 
different,  and  seeks  to  end  the  dependence  of  an  organization  upon  a  single  set  of  metrics  by  which  all 
software  development  efforts  must  be  measured.  The  G/Q/M  and  improvement  paradigms  have  been 
successfully  applied  to  several  industrial  settings  outside  of  the  SEL  [Romb87a,Romb87b]. 

Basili  and  Rombach  are  constructing  an  environment  called  TAME  (Tailoring  A  Measurement 
Environment)  to  support  the  Quality  Improvement  and  G/Q/M  paradigms  [Basi88a].  A  series  of  TAME 
prototypes  which  support  the  measurement  of  Ada  projects  are  currently  being  developed  [Basi87a]. 
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6.6.2  Integration  of  Measurement  Into  Software  Development 

The  majority  of  current  measurement  programs  employ  a  variety  of  stand-alone  tools  to  perform  metric 
analysis.  However,  the  coherent  specification,  collection,  and  analysis  of  metrics  requires  an  integrated 
environment  for  measurement;  integrated  in  the  sense  of  feeding  back  metric  information  into  the 
software  development  process  so  that  both  current  and  future  development  can  be  improved.  Too  often, 
the  results  of  metric  analysis  are  used  only  for  evaluating  software,  and  not  for  learning  how  to  better 
design,  implement,  and  measure  software.  The  TAME  environment  (mentioned  above)  is  an  encouraging 
effort  aimed  at  providing  an  integrated  environment  in  which  software  process  specification  languages  are 
used  to  describe  both  the  development  and  measurement  processes  as  well  as  their  interfaces. 

Selby  has  developed  a  set  of  guidelines  for  incorporating  metrics  into  a  software  development  environ¬ 
ment  [Selb87a].  These  guidelines  address  the  varying  scope  of  metrics  an  environment  should  possess  (for 
example,  product  as  well  as  process  metrics  and  design  as  well  as  code  metrics),  and  also  the  method  for 
collecting  and  analyzing  metrics. 

6.6.3  Needs  for  Future  Automated  Support 

Automated  tools  are  required  for  effective  and  affordable  software  measurement.  While  there  exist 
many  tools  which  deal  with  software  metric  analysis,  few  automate  the  activities  of  evaluating,  predicting, 
controlUng,  and  learning.  This  is  not  surprising,  given  the  traditional  focus  on  the  use  of  simple,  stand¬ 
alone  software  metrics.  Efforts  must  be  directed  at  developing  a  complete  measurement  environment 
which  supports  a  comprehensive  measurement  methodology. 
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7.  SOFTWARE  RELIABILITY  ASSESSMENT  TECHNOLOGY 

This  section  discusses  software  reliability,  one  of  the  factors  that  determine  software  quality.  Software 
reliability  is  singled  out  for  further  discussion  for  two  reasons: 

1 .  Highly  reliable  software  is  essential  in  the  SDS,  and 

2.  Software  reliability  is  perceived  as  being  unique  among  the  software  quality  fac¬ 
tors. 

Musa,  for  example,  claims  that  software  reliability  is  “probably  the  most  important  of  the  characteristics 
inherent  in  the  concept  ‘software  quality*  ”  and  “the  most  readily  quantifiable  of  the  attributes  of  software 
quality”  [Musa87]. 

7.1  Scope 

As  its  title  suggests,  this  section  focuses  on  how  to  asse.ts  software  reliability.  It  is  not  intended  to  cover 
the  issue  of  how  to  achieve  software  reliability  (or,  perhaps  more  accurately,  how  to  achieve  reliable 
software).  Nevertheless,  to  place  this  section  in  context,  it  is  worthwhile  to  at  least  identify  the  basic 
approaches  that  can  be  used  to  enhance  software  reliability. 

Clearly,  the  most  powerful  approach  is  fault  prevention.  Here,  the  aim  is  to  improve  the  software 
development  process  itself,  so  that  faults  are  never  introduced  into  the  software.  Ideally,  the  software 
development  process  would  enable  the  construction  of  fault-free  software.  The  entire  field  of  software 
engineering  is  directed  at  improving  the  software  development  process. 

Another  approach  to  increasing  software  reliability  is  to  facilitate  the  detection  and  correction  of  faults 
and  errors.  Four  technologies  embodying  this  general  approach  are  covered  in  the  preceding  sections  of 
this  report:  dynamic  and  static  analysis,  formal  verification,  and  software  measurement  (that  is,  software 
quality  evaluaticn). 

A  third  approach  to  increasing  software  reliability  is  software  fault  tolerance,  in  particular,  software- 
implemented  methods  for  enhancing  tolerance  to  software  faults.  This  approach  was  briefly  discussed  in 
Section  2.1.3.  Its  aim  is  to  minimize  or  eliminate  the  impact  of  software  faults.  The  most  prominent 
software  fault  tolerance  methods  are  recovery  blocks  [Rand75,Ande76a]  and  N-version  programming 
[Aviz85,Knig86a].  These  methods  have  been  developed  in  recognition  of  the  fact  that  software  faults  can¬ 
not  be  totally  prevented  or  eliminated,  but  that  their  impact  (at  least  in  the  form  of  critical  failures)  must 
be  minimized. 

7.2  Current  Methodology 

An  overview  of  the  state  of  the  art  in  software  reliability  assessment  is  presented  in  this  subsection. 
The  point  of  this  overview  is  to  clarify  what  is  meant  by  software  reliability,  and,  moreover,  to  indicate 
what  type  of  work  is  being  done  in  the  name  of  software  reliability  assessment.  For  a  comprehensive 
treatment  of  the  subject  of  software  reliability  assessment,  the  reader  is  referred  to 
[Farr83,Goel85,Musa87,Rama82]. 
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7.2.1  Definition  of  Software  Reliability 

The  term  software  leliability  has  taken  on  a  narrow  meaning  in  the  •o^are  engineering  literature,  in 
particular  a  much  narrower  meaning  than  those  not  conversant  with  the  literature  might  suppose. 
According  to  Musa,  software  reliability  is  the  “probability  of  f^ure-free  operation  of  a  program  for  a 
specified  time  in  a  specified  environment”  [MusaB7].  Software  failure,  in  turn,  is  defined  as  the  “depar¬ 
ture  of  program  operation  from  requirements”  [Musa87}.  At  this  point,  subjectivity  enters  Musa’s  stream 
of  definitions:  requirements  are  not  defined,  but  only  discussed.  Musa  concludes  that  they  can  include 
both  explicit  and  implicit  needs.  Thus,  a  behavior  can  be  classified  as  a  failure  on  the  basis  of  unstated 
requirements,  which  are  inherently  subjective.  Environment  is  equated  with  operational  profile,  which  is 
defined  as  “the  set  of  all  possible  input  states  (input  space)  with  their  associated  probabilities  of 
occurrence.”  As  noted  in  [Rama82],  “the  software  need  be  correct  only  for  inputs  for  which  it  is 
designed  {specified  environment).” 

Finally,  although  time  may  seem  like  a  straightforward  concept,  it  can  be  interpreted  in  a  number  of 
ways,  to  suit  the  application  at  hand.  According  to  [Goel85],  it  “may  mean  a  single  run,  a  number  of  runs, 
or  time  expressed  in  calendar  or  execution  time  units.” 

7.2.2  General  Approach  to  Software  Reliability  Assessment 

The  problem  that  has  received  the  most  attention  in  the  field  of  software  reliability  assessment  is  that  of 
estimating  future  failure  behavior  from  past  failure  behavior.  In  Goel’s  words  [Goel85], 


A  commonly  used  approach  for  measuring  software  reliability  is  via  an  analytical  model 
whose  parameters  are  generally  estimated  from  available  data  on  software  failures.  Relia¬ 
bility  and  other  relevant  measures  are  then  computed  from  the  fitted  model. 


The  analytical  model  is  referred  to  as  a  software  reliability  model.  Numerous  software  reliability  models 
have  been  proposed  [Farr83,Goel85,Musa87,Rama82]. 

Some  work  has  been  done  on  predicting  reliability  from  properties  of  the  software  and  the  process  by 
which  it  was  developed  [Musa87,McCa87a].  This  work  is  not  addressed  here,  because  it  is  so  closely  asso¬ 
ciated  with  software  metrics,  the  topic  of  the  previous  section  of  this  report.  In  particular,  the  conclu¬ 
sions  of  the  previous  section  apply  to  the  specific  case  of  software  reliability  prediction. 

7.2.3  Classification  of  Software  Reliability  Models 

In  [Goel85],  Goel  divides  software  reliability  models  into  four  broad  classes: 

•  Times  Between  Failures  Models:  In  these  models,  the  time  between  failures  is 
treated  as  a  randoi..  variable,  drawn  from  a  distribution  whose  parameters  depend 
on  the  number  of  faults  remaining  in  the  program.  Typically,  the  time  between 
failures  is  assumed  to  decrease  as  faults  are  detected  (and  subsequently 
corrected).  Hence,  these  models  are  sometimes  referred  to  as  software  reliability 
growth  models.  Estimates  of  the  parameters  are  obtained  from  the  observed 
values  of  times  between  failures.  Estimates  of  software  reliability,  mean  time  to 
next  failure,  etc.,  are  then  obtained  from  the  fitted  model. 

•  Failure  Count  Models:  In  this  class  of  models,  the  failure  process  is  represented  as 
a  stochastic  process  with  a  time  dependent  failure  rate.  Again,  it  is  typically 
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assumed  that  the  failure  rate  decreases  over  time.  Parameters  of  the  failure  rate 
can  be  estimated  from  the  observed  values  of  &ilure  counts  or  failure  times. 

•  Fault  Seeding  Models:  In  these  models,  a  known  number  of  faults  is  seeded  into  a 
program  with  an  unknown  number  of  indigenous  faults.  Based  on  the  number  of 
seeded  and  indigenous  faults  discovered  during  testing,  an  estimate  of  the  original 
(that  is,  prior  to  seeding)  fault  content  of  the  program  is  made. 

•  Input  Domain  Based  Models:  In  the  basic  model  of  this  class,  test  cases  are  gen¬ 
erated  randomly  from  an  operational  profile  that  is  assumed  to  be  representative 
of  the  real  usage  of  the  program.  Based  on  the  number  of  failures  observed  during 
execution  of  the  test  cases,  an  estimate  of  program  reliability  is  obtained. 

Models  of  the  first  two  classes  are  sometimes  referred  to  collectively  as  time  domain  models.  Time 
domain  models  are  the  best  established  and  most  widely  used  models. 

7.2.4  Time  Domain  Models 

Time  domain  models  are  strongly  advocated  by  Musa,  lannino,  and  Okumoto  in  [Musa87].  In  this 
book,  the  authors  discuss  the  theory  and  application  of  the  models.  They  present  a  model  classification 
scheme,  describe  several  specific  models  in  each  class,  and  offer  a  set  of  model  comparison  criteria.  The 
criteria  include  predictive  validity,  capability,  quality  of  assumptions,  applicability,  and  simplicity.  On  the 
basis  of  these  criteria,  two  models,  both  of  which  fall  into  the  failure  count  model  category,  are  judged 
superior  to  the  others:  the  basic  execution  time  model  and  the  logarithmic  Poisson  execution  time  model. 

Both  models  interpret  “time”  to  be  execution  time,  and  both  provide  for  a  mapping  from  execution 
time  to  calendar  time.  Furthermore,  both  assume  that  the  failure  process  is  a  nonhomogeneous  Poisson 
process.  That  is,  failures  occur  according  to  a  Poisson  process,  with  a  time  varying  rate.  The  rate  is 
referred  to  as  the  failure  intensity  (mean  number  of  failures  per  unit  time). 

The  fundamental  concepts  underlying  the  two  models  is  depicted  in  Figure  7-1,  which  is  extracted  from 
[Musa87].  In  the  basic  execution  time  model,  the  failure  intensity  X  decreases  linearly  with  the  mean 
failures  experienced  p: 


Hp)  =  Ao 
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where  Xq  is  the  initial  failure  intensity  and  j/q  is  the  total  number  of  failures  that  would  occur  in  infinite 
time.  In  the  logarithmic  Poisson  execution  time  model,  the  failure  intensity  decreases  exponentially; 

\(ji)  =  \o  expi-  Op) , 


where  Xq  ^ain  represents  the  initial  failure  intensity  and  9  is  the  failure  intensity  decay  parameter.  As 
explained  in  [Musa87],  the  basic  execution  time  model  represents  the  case  in  which  the  discovery  of  each 
failure  (and  subsequent  repair  of  underlying  faults)  leads  to  a  constant  reduction  (of  1  divided  by  the  total 
number  of  failures)  in  failure  intensity.  The  logarithmic  Poisson  execution  time  model,  on  the  other  hand, 
represents  the  case  in  which  early  failures  lead  to  greater  reductions  in  failure  intensity  than  later  failures. 
In  other  words,  the  “benefit”  accrued  by  repair  processes  (initiated  in  response  to  failures)  decreases 
exponentially  as  a  function  of  the  number  of  failures. 
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Mean  failures  experienced  p 
Figure  7-1.  Failure  Intensity  Functions 

As  indicated  in  the  figure  and  equations,  each  model  has  two  parameters,  one  being  the  initial  failure 
intensity  Xo  other  representing  failure  intensity  change  (total  failures  i/q  in  the  basic  execution  time 

model  and  failure  intensity  decay  parameter  6  in  the  logarithmic  Poisson  execution  time  model).  The 
values  of  these  parameters  may  be  estimated  from  failure  data  via  standard  statistical  techniques 
(Musa87].  In  addition,  for  the  basic  execution  time  model,  parameter  values  may  be  predicted,  prior  to 
execution  of  the  software,  from  characteristics  of  the  software  [Musa87]. 

Several  useful  measures  can  be  derived  from  these  models.  They  include  the  expected  number  of 
failures  to  reach  a  specified  failure  intensity  objective  and  the  expected  execution  time  to  reach  a  specified 
failure  intensity  objective.  These  measures  are  also  detailed  in  [Musa87]. 

7.3  Critique  of  Current  Methodology:  Some  Fundamental  Problems  and  Liniitations 

Current  software  reliability  assessment  technology  suffers  from  some  fundamental  problems  and  limi¬ 
tations.  These  seem  to  stem  from  the  fact  that  software  reliability  assessment  has  not  established  itself  as 
a  discipline  in  its  own  right;  current  technology  remains  bound  to  the  hardware  reliability  assessment 
technology  from  which  it  evolved. 

7.3.1  Evolution  from  Hardware  Reliability  Assessment 

The  field  of  software  reliability  assessment  evolved  from  the  field  of  hardware  reliability  assessment. 
But  there  are  key  differences  between  software  and  hardware  that  limit,  or  should  limit,  the  extent  to 
which  the  hardware  concepts  can  be  applied  in  the  software  world.  Some  of  these  differences  and  their 
implications  are  described  below. 

7.3. 1.1  Source  of  Failures 

As  noted  in  [Musa87],  the  source  of  hardware  failures  (at  least  of  the  hardware  failures  traditionally 


70 

UNCLASSIFIED 


UNCLASSIFIED 


addressed  in  hardware  reliability  assessment)  is  the  physical  aging  and  deterioration  of  hardware  com¬ 
ponents.  The  source  of  software  failures  is  software  f^ts,  which  are  manifestations  of  design  and  imple¬ 
mentation  errors.  Hardware  failures  resulting  from  faults  introduced  by  design  errors  share  more  in  com¬ 
mon  with  software  failures  than  with  «ge-related  hardware  failures.  Thus,  the  distinction  could  be  made 
on  the  basis  of  aging  versus  design  instead  of  on  the  basis  of  hardware  versus  software.^ 

This  observation  has  a  critical,  but  largely  unheeded,  implication.  Hardware  reliability  is  inherently 
time  and  usage  dependent  (as  in  5-year,  50,000  mile  automobile  warranties).  Thus,  it  makes  sense  for 
hardware  reliability  to  be  defined  in  terms  of  failure-free  operation  over  a  specified  “exposure”  time 
period. 

Software  reliability,  on  the  other  hand,  is  not  directly  dependent  on  time.  First,  consider  software 
faults,  which  are  the  source  of  software  failures.  Assuming  that  the  software  is  not  changed,  software 
faults  are  time-invariant.  That  is,  software  faults  are  introduced  during  design  and  implementation;  new 
software  faults  do  not  arise  because  of  the  passage  of  time  or  occurrence  of  processing.  Now  consider 
software  failures.  Software  failures  are  dependent  on  inputs.  For  a  given  (deterministic)  program  and  a 
given  input  state,  either  the  software  always  operates  correctly  or  it  always  fails.  Time  is  not  a  factor, 
except  in  the  sense  the  software  may  be  exposed  to  different  inputs  over  time.  Given  the  same  operational 
profile,  the  software  reliability  does  not  vary  with  time. 

So,  while  time  (in  some  unit  of  exposure)  appears  to  be  the  “right”  independent  variable  for  hardware 
reliability  assessment,  it  is  not  so  natural  for  software  reliability  assessment.  Software  reliability  is  depen¬ 
dent  not  on  time  directly,  but  on  (1)  the  presence  of  faults  and  (2)  the  exposure  to  faults,  more  precisely, 
the  exposure  to  input  states  that  lead  to  execution  paths  on  which  faults  are  encountered.  Both  of  these 
factors  have  to  be  taken  into  account  in  assessing  software  reliability. 

This  view  is  reflected  by  Pamas  [Pam88].  He  suggests  two  complementary  probabilistic  measures  of 
software  quality;  reliability  and  trustworthiness.  Software  reliabili^  is  defined  as  “the  probability  of  not 
encountering  an  input  history  that  causes  a  failure”;  software  trustworthiness  is  defined  as  “the  probabil¬ 
ity  that  no  serious  design  error  remains  after  a  set  of  randomly  chosen  tests  [have  been]  passed.” 

Therefore,  the  emphasis  on  time  domain  models  of  software  reliability  may  not  be  appropriate,  espe¬ 
cially  in  the  case  of  highly  reliable  software.  For  highly  reliable  software,  the  goal  is  not  to  estimate  meas¬ 
ures  such  as  failure  rate,  but  to  assure  that  critical  failures  cannot  occur. 

7.3. 1.2  Target  of  Assessment 

In  hardware  reliability  assessment,  the  basic  targets  of  assessment  are  relatively  low-level  components, 
sue' .  as  memory  chips.  The  reliability  of  a  given  class  of  components  (for  example,  a  given  type  of 
memory  chip  or  a  given  batch  of  a  given  type  of  memory  chip)  is  established  by  sampling  from  the  popula¬ 
tion  of  components  in  the  class.  The  components  in  a  class  are  identical  in  design  but  are  distinct  physi¬ 
cally.  It  is  assumed  that  the  components  of  a  class  have  the  same  reliability,  but  that  failures  in  the  indivi¬ 
dual  components  are  independent. 

The  low-level  hardware  components  are  used  as  building  blocks  in  constructing  higher  level 


S.  However,  m  keeping  with  general  practice,  this  section  continues  to  use  the  hardware/software  categorization,  with  the 
understanding  that  hardware  reliability  is  being  used  in  the  traditional  sense  of  age-related  reliability. 
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components,  or  systems.  The  reliabilities  of  the  buildup  blocks  can  be  used  to  estimate  the  reliabilities  of 
the  system  [Shoo^].  For  example,  the  reliability  of  two  series-connected  components  (that  is,  two  com¬ 
ponents,  both  of  wUch  must  operate  correctly)  is  the  product  of  the  reliabilities  of  the  two  components. 
Again,  the  independence  of  foilures  in  different  components  is  the  underlying  assumption. 

Combinatorial  analysis  is  at  the  core  of  hardware  reliability  assessment.  In  fact,  it  is  the  building-block 
approach  in  conjunction  with  the  combinatorial  analysis  that  supports  the  construction  of  hardware  sys¬ 
tems  with  specified  levels  of  reliability. 

In  software  reliability  assessment,  on  the  other  hand,  the  basic  targets  of  assessment  are  relatively 
high-level  components.  As  noted  in  [Musa87], 


In  general,  it  appears  that  these  models  can  be  applied  to  any  type  or  size  of  software  pro¬ 
ject,  with  the  following  exception.  Very  small  projects  (less  than  about  5000  lines  of  code) 
may  not  experience  sufficient  failures  to  permit  accurate  estimation  of  execution  time  com¬ 
ponent  parameters  and  the  various  derived  quantities. 


Moreover,  the  high-level  components  are  “unique.”  Instead  of  being  designed  as  building  blocks,  they 
are  application-specific.  Thus,  the  combinatori^  analysis  that  is  fundamental  to  hardware  reliability 
assessment  does  not  carry  over  to  software  reliability  assessment. 

This  distinction  between  hardware  reliability  assessment  and  software  reliability  assessment  is  recog¬ 
nized  by  McCall,  et  al.  Their  articulation  of  the  problem,  repeated  here  for  emphasis,  is  as  follows 
[McCa87a]; 


Hardware  components  consist  of  separate  parts,  each  of  which  may  be  used  in  many  other 
applications,  such  as  a  lA  250V  diode  or  a  16k  dynamic  [Random  Access  Memory]  RAM 
chip.  Failure  rates  can  be  established  for  these  parts  either  from  test  or  from  analysis  of 
field  data.  The  procedures  of  MIL-STD-756B  [Reliability  Modeling  and  Prediction]  assiune 
that  the  reliability  of  a  component  is  the  product  of  the  reliability  of  its  (series-connected) 
parts.  The  software  analog  to  this  would  be  to  test  individual  assignment,  branching,  and 
[Input/Output]  I/O  statements  and  to  declare  the  reliability  of  a  procedure  to  be  the  pro¬ 
duct  of  the  reliability  of  its  individual  statements,  lliis  analog  is  faulty  because: 
(a)  statements  cannot  be  meaningfully  tested  in  isolation  and  (b)  many  software  failures 
arise  not  from  faults  in  a  single  statement,  but  rather  from  interactions  between  multiple 
statements  (or  from  interactions  between  hardware  and  software). 


The  point  is  that  the  ultimate  objective  of  traditional  hardware  reliability  assessment  —  the  construc¬ 
tion  of  systems  with  specified  levels  of  reliability  —  cannot  be  accomplished  in  the  software  domain  in  the 
same  way  that  it  is  in  the  hardware  domain. 

7.3.2  Applicability  to  Life  Cycle  Phases 

In  [GoeI85],  Gael  discusses  the  applicability  of  software  reliability  models  to  the  following  phases  of 
the  life  cycle:  (1)  design  phase,  (2)  unit  testing,  (3)  integration  testing,  (4)  acceptance  testing,  and  (5) 
operational  phase.  Based  on  assumptions  made  by  the  various  models,  which  he  enumerates  in  the  arti¬ 
cle,  he  comes  to  the  following  conclusions.  Integration  testing  is  the  only  phase  where  all  four  categories 
of  models  —  times  between  failures  models,  failure  count  models,  fault  seeding  models,  and  input  domain 
based  models  —  are  applicable.  None  of  the  models  is  applicable  during  the  design  phase,  because  of  the 
lack  of  test  cases  and  failure  history.  None  is  applicable  in  practice  during  unit  testing,  although  fault 
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seeding  models  and  input  domain  based  models  are  applicable  in  theory.  During  acceptance  testing,  the 
&ult  count  and  input  domain  based  models  are  applicable,  while  the  others  are  not.  Finally,  during  the 
operational  phase,  only  the  fault  count  models  are  applicable. 

The  point  here  is  that  the  current  methodologies  are  applicable  only  to  software  that  is  being  executed, 
and,  moreover,  only  at  integration  testing  and  later  in  the  life  cycle. 

7.3.3  Applicability  to  Highly  ReUable  Systems 

The  acceptance  testing  of  highly  reliable  software,  which  is  a  crucial  aspect  in  the  development  of  the 
SDS,  presents  its  own  unique  problem.  Highly  (or  ultrahighly)  reliable  software  should  exhibit  no  (criti¬ 
cal)  failures  during  acceptance  testing.  In  this  case,  a  “failure  history”  does  accumulate,  but  it  is  one  of 
no  failures.  Are  the  models  applicable  to  highly  reliable  software?  Or  are  they  applicable  only  to  “unreli¬ 
able”  software?  The  problem  is  captured  in  the  following  dilemma  posed  by  Knight 

Hypothesis  1.  For  a  system  that  is  required  to  achieve  very  high  reliability,  if  any  failure 
occurs  during  verification  testing,  then  the  system  will  never  achieve  the  required  level  of 
reliability.^ 


Hypothesis  2.  If  a  system  does  not  fail  during  testing  no  reliability  assessment  is  possible 
because  there  is  no  data. 

Hypothesis  2  is  clearly  an  overstatement.  Statistically  valid  conclusions  can  be  drawn  from  the  lack  of 
failures.  According  to  Pamas  [Pam88],  testing  can  in  theory  be  used  to  establish  trustworthiness  in 
software.  However,  as  he  warns,  the  amount  of  testing  that  would  be  required  in  practice  is  simply  prohi¬ 
bitive. 

7.3.4  Traditional  Uses  of  Software  Reliability  Assessment 

At  this  point,  it  is  appropriate  to  consider  the  role  that  software  reliability  assessment  traditionally 
plays  in  the  software  engineering  process.  In  [Musa87],  Musa,  lannino,  and  Okumoto  cite  the  following 
four  uses  of  software  reliability  assessment; 

•  To  (quantitatively)  evaluate  software  engineering  technology; 

•  To  evaluate  development  status  during  the  test  phases  of  a  project; 

•  To  monitor  the  operational  performance  of  software  and  to  control  new  features 
added  and  design  changes  made  to  the  software;  and 

•  To  gain  insight  into  the  software  ^oduct  and  the  software  development  process, 
through  a  quantitative  understanding  of  software  quality. 

Software  reliability  modeling  has  proved  to  be  effective  in  the  cited  uses  (especially  the  second  and 


6.  This  dilemma  was  posed  at  the  IDA  Testing  and  Evaluation  Workshop  that  was  held  in  support  of  this  report.  While  it  could  of 
course  be  more  carefully  stated,  it  does  make  its  point. 

7.  Hypothesis  1  holds  only  if  very  high  reliability  means  100%  reliability,  or  absence  of  failures. 
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third),  which  are  valuable  in  certain  environments,  such  as  the  American  Telephone  &  Telegraph  (AT&T) 
environment  in  which  Musa  and  others  have  applied  the  models. 

7.4  Conclo^on 

This  section  presents  a  summary  evaluation  of  current  software  reliability  assessment  technology, 
based  on  the  above  critique.  It  then  suggests  directions  for  future  research.  Finally,  it  closes  by  attempt¬ 
ing  to  put  software  reliability  assessment  into  a  proper  perspective. 

7.4.1  Summary  Evaluation  of  Current  Software  Reliability  Assessment  Technology 

Clearly,  the  preceding  critique  of  software  reliability  assessment  technology  raises  questions  about  the 
applicability  of  the  current  failure-history-based  methodology  to  highly  reliable  systems.  As  Goel  con¬ 
tends  in  a  position  statement  prepared  for  the  IDA  Testing  and  Evaluation  Workshop  [Bryk89]: 


The  current  methodology  for  evaluating  software  reliability  is  based  on  a  very  restricted 
premise,  viz,  the  future  error  occurrence  phenomenon  is  a  stochastic  extrapolation  of  the 
recent  past.  This  approach  is  too  simplistic  and  is  not  likely  to  be  very  useful  for  ultra-high 
reliability  systems,  such  as  [the]  SDS. 


More  significantly,  this  critique  raises  questions  about  the  philosophy  and  assumptions  underlying 
today’s  software  reliability  assessment  technology.  It  can  be  argued  that  current  software  reliability 
assessment  methodologies,  as  well  as  the  definition  of  software  reliability  itself,  are  “artifacts”  of  the  evo¬ 
lution  of  software  reliability  assessment  from  hardware  reliability  assessment.  Design  faults  (be  they 
hardware  or  software)  demand  a  new  approach;  they  cannot  be  treated  adequately  by  the  same  methodol¬ 
ogy  that  was  developed  for  £^e-related  faults. 

The  current  definition  and  methodologies  constrain  what  can  be  done  in  the  context  of  software  relia¬ 
bility  assessment.  In  particular,  current  methodologies  can  be  useful  only  as  management  tools,  in  accom¬ 
plishing  purposes  such  as  estimating  project  schedules,  optimizing  the  allocation  of  project  resources, 
and  optimizing  the  timing  of  new  releases  of  software.  Granted,  these  are  worthwhile  purposes;  the 
acceptance  of  current  software  reliability  assessment  technology  stems,  in  large  part,  from  its  success  as  a 
man^ement  tool  in  exactly  these  t)’pes  of  applications. 

However,  the  ultimate  goal  of  any  reliability  assessment  technology  should  be  to  facilitate  and  assure 
the  construction  of  systems  of  specified  levels  of  reliability.  Because  of  the  fundamental  limitations  of 
current  software  reliability  assessment  methodologies  in  this  regard,  further  work  on  enhancing  current 
methodologies  is  unlikely  to  yield  satisfactory  results. 

7.4.2  Future  Directions  of  Software  Reliability  Assessment  Technology 

In  order  for  software  reliability  assessment  technology  to  contribute  significantly  to  SDS  development, 
the  software  engineering  community  must  support  the  evolution  of  the  concept  of  software  reliability. 
Software  reliability  assessment  must  move  beyond  the  goal  of  estimating  future  failure  behavior  based  on 
past  failure  behavior  toward  the  goal  of  constructing  reliable  software  and,  ultimately,  reliable  systems. 
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7.4.2.1  Assessment  of  the  Software  Development  Process 

Just  as  it  is  not  possible  to  “test”  correctness  into  software,  it  is  not  possible  to  test  reliability  or 
trustworthiness  into  software.  So,  what  can  be  done?  As  suggested  by  Pamas  [Pam88]  and  Evangelist 
p3ryk89],  the  solution  lies  in  the  software  development  process.  Pamas  maintains  [Pam88]: 

Software  can  be  used  in  safety-critical  applications  but  extreme  discipline  in  design,  docu¬ 
mentation,  testing  and  review  are  needed.  Standard  practice  is  not  adequate. 

Because  of  the  inevitable  reliance  upon  the  software  development  process  as  the  most  powerful  means 
of  constructing  reliable  software,  software  reliability  assessment  technology  should  shift  focus  —  from 
assessment  of  the  reliability  of  individual  software  products  to  assessment  of  the  reliability  of  software 
engineering  methodologies,  practices,  tools,  and  techniques.^  Then,  the  software  engineering  community 
could  begin  to  approach  the  ultimate  goal  of  the  construction  of  reliable  software. 

Of  course,  this  is  a  most  challenging  task.  It  involves  capturing,  recording,  and  analyzing  features  of 
the  software  development  process,  throughout  the  life  cycle.  In  most  of  the  software  measurement  work 
that  has  been  done  to  date,  the  completed  software  product  rather  than  the  entire  software  development 
process  is  the  target  of  measurement  and  analysis.  Efforts  have  concentrated  on  measuring  isolated,  low- 
level  features  of  software  products.  While  the  low-Ievel  features  are  readily  measurable,  the  significance 
of  their  measured  values  is  questionable. 

In  undertaking  the  task  of  assessing  the  software  development  process,  the  following  points  should  be 
kept  in  mind: 

•  Methodolo^es  and  practices,  in  order  to  be  compared,  must  be  rigorously  defined 
and  faithfully  followed. 

•  “Desirable”  properties  of  software,  such  as  reliability,  must  be  identified;  then, 
effective  measures,  which  can  quantify  these  properties,  must  be  defined. 

•  Effective  experimental  design  must  be  employed. 

7.4.2.2  Assessment  of  Software  Reliability  in  a  System  Context 

Software  reliability  must  be  assessed  in  a  system  context.  As  noted  in  Section  2.1,  the  correctness  of 
software  for  distributed,  real-time  applications  cannot  be  established  in  isolation  from  the  underlying 
computing  system,  for  two  distinct  reasons.  First,  in  real-time  applications,  the  correctness  of  software 
entails  not  only  the  values  of  results,  but  also  the  time  at  which  results  become  available.  Timing,  of 
course,  is  a  function  of  the  software,  as  well  as  of  the  underlying  computing  system. 

Second,  in  complex  distributed  real-time  systems,  hardware  components  are  bound  to  fail,  software 
faults  are  bound  to  exist,  and  unexpected  inputs  are  bound  to  occur.  The  software,  as  the  controlling  ele¬ 
ment  of  the  system,  must  be  designed  to  deal  with  these  faults  and  failures.  The  software  must  provide  for 
system  fault  tolerance  and  graceful  degradation. 


8.  Here,  the  reliability  of  a  methodology  means  the  reliability  afforded  by  the  methodology  to  the  software  product  being 
developed. 
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Therefore,  just  as  the  correctness  of  software  depends  on  its  ability  to  meet  tuning  constraints,  its 
correctness  also  depends  on  its  ability  to  degrade  gracefiilly  in  the  face  of  faults,  failures,  and  unexpected 
inputs.  The  problem  lies  in  the  quantification  of  “gracefiil  degradation.”  Intuitively,  the  concept  involves 
a  mappii^  of  the  domain  of  potential  faults/failures  into  a  range  of  "mission  impairment.”  For  a  specified 
subset  of  the  domain,  the  software  should  be  able  to  assure  a  mission  impairment  of  zero.  Beyond  that 
subset  of  the  domain,  the  mission  impairment  should  not  rise  “dramatically,”  but  “gracefully.” 

The  issue  of  fault  tolerance  must  be  addressed  in  ail  phases  of  the  life  cycle: 

•  Potential  faults  and  failures  must  be  identified  at  the  outset. 

•  The  required  level  of  fault  tolerance  must  be  specified,  perhaps  in  terms  of  the 
“graceftil  degradation”  curve. 

•  Fault  tolerant  techniques  must  be  incorporated  into  the  software,  to  provide  for 
the  required  level  of  f^t  tolerance. 

•  The  impact  of  faults  and  failures,  as  v/ell  as  the  effectiveness  of  fault  tolerant  tech¬ 
niques,  must  be  assessed  during  design  (possibly  through  simulation),  as  well  as 
during  testing. 

•  The  reliability  afforded  by  various  fault  tolerant  techniques  must  be  assessed,  in 
the  same  way  that  the  reliability  of  other  software  engineering  methodologies 
should  be. 

7.4.2.3  Assessment  of  System  Reliability 

Finally,  software  reliability  must  be  taken  into  account  when  assessing  system  reliability.  Often,  system 
designers  assume  a  software  reliability  of  100%  when  evaluating  system  reliability  [Pam88].  Such  an 
assumption  is  clearly  unreasonable.  Innovative  approaches  are  needed  here.  It  is  not  sufficient  to  follow 
the  much  touted  practice  of  simply  casting  software  reliability  in  hardware  reliability  terms  and  then  using 
combinatorial  analysis  to  derive  system  reliability.  The  distinction  between  design  faults  and  ^e-related 
faults  must  be  considered. 

7.4.3  Caveat 

In  closing  this  section,  it  is  appropriate  to  place  software  reliability  assessment  (and,  more  generally, 
software  testing  and  evaluation)  in  perspective.  Specifically,  it  is  important  to  recognize  or  acknowledge 
what  software  testing  and  evaluation  can  not  accomplish,  so  that  unrealistic  expectations  do  not  prevail. 

Consider  the  concept  of  software  reliability.  In  its  broadest  sense,  software  reliability  is  equated  with 
the  probability  of  “mission  success.”  It  is  assumed  that  highly  reliable  software  will  successfully  perform 
the  intended  mission.  This  sense  of  the  concept  is  intuitively  appealing,  as  evidenced  by  the  fact  that  dis¬ 
cussions  on  software  reliability,  especially  in  the  context  of  SDS,  almost  always  degenerate  into  discus¬ 
sions  on  the  feasibility  of  builoing  a  successful  system;  but,  it  leads  to  inflated  expectations  of  what 
software  reliability  assessment,  as  well  as  software  testing  and  evaluation,  can  accomplish.  The  probabil¬ 
ity  of  mission  success  depends  on  factors  that  fall  outside  the  scope  of  software  and  the  traditional 
software  testing  and  evaluation  process.  These  factors  include  the  following: 

•  Validity  of  Input  Assumptions:  Software  is  developed  to  respond  to  specified 
inputs,  in  the  case  of  the  SDS,  for  example,  a  specified  threat.  If  the  specified 
threat  differs  from  the  actual  threat,  then  the  fact  that  the  software  is  “ultrahighly 
reliable”  for  the  specified  threat  indicates  little  or  nothing  about  the  reliability  of 
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the  software  for  the  actual  threat. 

•  Validity  of  Hardware  Assumptions:  Hardware  (and,  more  generally,  environmen¬ 
tal)  assumptions  have  the  same  impact  as  input  assumptions.  If  they  are  invalid, 
then  the  reliability  of  the  software  with  respect  to  those  assumptions  indicates  lit¬ 
tle  about  reality. 

•  Effectiveness  of  Strategy:  Strategy  is  embodied  in  software.  How  can  the  reliabil¬ 
ity  (or  effectiveness)  of  a  strategy  be  quantified?  Are  probabilities  of  success 
computed  for  non-automated  (defensive  or  military)  strategies?  Software  reliabil¬ 
ity  can  hardly  be  expected  to  subsume  quantification  at  this  level. 

In  short,  traditional  software  testing  and  evaluation  can  offer  some  assurance  that  a  software  product 
accurately  implements  a  proposed  solution  to  a  specified  problem,  but  cannot  assure  that  the  problem  is 
adequately  specified  or  that  proposed  solutions  are  in  'ome  sense  “good.”  As  pointed  out  in  Section  2, 
the  most  effective  approach  for  dealing  with  these  difficult  issues  is  an  iterative  development  approach, 
incorporating  formal  design  specifications,  simulation,  and  prototyping.  This  approach  has  been  adopted 
by  the  SDIO.  It  is  important  that  software  testing  and  evaluation  research  and  development  focus  on  sup¬ 
porting  this  approach. 


I 


I 
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8.  RECOMMENDED  TASKS  TO  EXPLOIT  EXISTING  TECHNOLOGY 

This  section  of  the  report  identifies  a  number  of  tasks  required  to  ensure  that  SDS  software  testing  and 
evaluation  attains  the  maximum  effectiveness  and  efficiency  achievable  within  the  current  bounds  of  tech¬ 
nology.  These  tasks  do  not  fall  into  classifications  of  dynamic  and  static  analysis,  formal  verification, 
measurement,  or  reliability  assessment.  Instead,  they  cut  across  these  distinctions  to  provide  a  common 
framework  into  which  desirable  elements  of  each  type  of  available  technology  will  fit.  Tasks  to  extend 
technology  are  discussed  separately  in  Section  9. 

The  gap  between  the  state-of-the-art  and  practice  in  testing  and  evaluation  is  very  wide.  Consequently, 
it  should  be  possible  to  effect  a  substantial  improvement  in  software  reliability  by  requiring  the  use  of  a 
core  set  of  advanced  techniques,  supported  by  high-quality,  effective  automated  tools. 

To  achieve  this  goal,  this  section  outlines  four  major  tasks.  The  first  task  focuses  on  the  critical  need  to 
integrate  testing  and  evaluation  activities  into  software  development  as  a  whole.  It  also  addresses  how 
software  developers  should  be  provided  with  explicit  guidance  as  to  which  techniques  are  appropriate 
under  certain  circumstances  and  how  these  techniques  should  be  applied  to  achieve  the  necessary 
confidence  in  results  at  acceptable  cost.  The  second  task  discusses  the  requirement  for  a  SDS  Software 
Data  Collection  System  which  will  capture  and  assess  information  about  not  only  the  software  being 
developed,  but  the  technology  used  to  develop,  test,  and  support  that  software.  The  third  task  identifies 
some  of  the  issues  involved  in  the  development  of  an  automated  environment  to  support  testing  and 
evaluation  of  SDS  software.  In  many  respects,  the  final  task  provides  crucial  support  for  the  three  preced¬ 
ing  ones.  It  concerns  the  exploitation  of  process  modeling  techniques  to  both  explore  and  specify 
effective,  flexible  ways  of  integrating  testing  and  evaluation  into  software  development  activities.  While 
primarily  intended  to  exploit  available  technology,  these  tasks  do  themselves  require  some  advances  in 
technology.  For  example,  although  a  sophisticated  environment  will,  at  least  initially,  support  application 
of  available  testing  and  evaluation  techniques,  its  development  requires  increased  understanding  of  the 
ways  In  which  techniques  can  be  cooperatively  applied. 

Before  proceeding  to  discuss  these  tasks  in  more  detail,  a  word  of  caution  is  appropriate.  There  are 
many  important  research  results  which  have  never  had  the  benefit  of  significant  prototype  development 
and  exploration.  There  is  an  important  need  for  more  thorough  experimentation  with  these  research 
ideas,  and  this  is  best  accomplished  by  transferring  the  ideas  from  the  academic  research  setting  in  which 
they  were  developed  to  advanced  technology  development  laboratories.  New  technology  must  initially  be 
applied  on  a  few  selected  software  efforts  before  being  required  for  general  practice.  This  will  provide  an 
opportunity  to  carefully  monitor  the  application  of  the  technology  to  determine  its  benefits  and  costs  in 
both  technical  and  programmatic  senses.  These  testbed  sites  should  be  typical  software  development 
efforts  where  software  developers  work  under  realistic  deadlines  to  develop  “real”  code.  The  introduc¬ 
tion  of  new  technolog]'  can  incur  cost  and  schedule  penalties.  These  risks  must  be  reflected  in  contracts 
and  software  developers  provided  with  incentives  to  explore  the  full  potential  of  the  new  practices.  Addi¬ 
tionally,  before  new  technology  is  inserted  into  SDS  practices,  the  appropriate  policies  and  organizational 
support  must  be  in  place.  Technology  transition  and  insertion  is  a  difficult  and  expensive  activity.  It  is, 
however,  a  necessary  prerequisite  to  advancing  the  state-of-the-practice.  A  separate,  strongly  funded 
technology  transfer  effort  that  runs  in  parallel  with  R&D  efforts  must  be  instituted. 

Although  primarily  intended  for  SDS  software,  the  technology  discussed  here  offers  increased 
effectiveness  for  all  software  testing  and  evaluation  efforts. 
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8.1  Test  Planning  and  Testing  Re^piirements 

An  initial  model  of  a  possible,  promising  SDS  life  cycle  development  process  was  given  in  Section  2.3. 
The  role  that  a  sample  set  of  the  various  techniques  discussed  in  tlus  report  play  in  the  testing  and  evalua¬ 
tion  embedded  in  this  process  is  shown  in  Figure  8*1.  This  figure  is  not  meant  to  imply  that  all  the 
identified  techniques  are  those  specifically  recommended  for  use,  but  to  illustrate  the  types  of  testii^  and 
evaluation  techniques  that  can  be  applied  during  various  development  activities.  It  provides  an  initial  pic¬ 
ture  of  how  testing  and  evaluation  activities  can  be  effectively  integrated  into  the  development  process  to 
provide  timely  feedback  on  development  activities.  Many  issues  require  further  investigation.  For  exam¬ 
ple,  what  role  should  simulation  play  in  the  testii^  and  evaluation  scheme?  A  specific  set  of  candidate 
techniques  suitable  for  each  stage  of  software  development  should  be  identified.  As  data  on  the  respec¬ 
tive  costs  and  benefits  of  these  candidate  techniques  becomes  available,  those  required  for  SDS  software 
testing  and  evaluation  should  be  determined. 
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Figure  8«1.  Process  Model  and  Candidate  Techniques 

In  particular,  for  SDS,  this  process  model  should  be  extended  to  provide  support  for  determining 
which  SDS  engineering  products  must  be  subject  to  testing  and  evaluation,  and  the  level  of  effort  required 
for  different  products.  Additionally,  the  development  process  model  must  be  supported  by  a  model  of  the 
improvement  process  which  defines  such  activities  as  assessing  engineering  trade-offs  to  determine,  for 
example,  where  testing  dollars  should  be  allocated. 

One  of  the  mechanisms  proposed  for  implementing  this  model  is  the  formalized  use  of  system  and 
software  lest  plans.  It  is  vital  that  the  necessary  programmatic  practices  to  implement  this  test  plan  con¬ 
cept  be  investigated. 
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The  other  mechanism  is  the  use  of  formal  testing  requirements  imposed  at  each  development  stage. 
These  requirements  should  provide  precise  specification  of  testing  objectives  which  enables  quantitative 
assessment  of  both  testii^  and  evaluation  progress  and  outstanding  ne^.  They  should  not  only  guide  the 
selection  of  the  appropriate  techniques  for  software  under  test,  but  also  guide  the  application  of  these 
techniques,  liraceabili^  from  system  requirements,  throi^  testing  requirements,  down  to  individual  test¬ 
ing  objects  and  activities  is  necessary  to  support  both  monitoring  of  the  overall  test  and  evaluation  status 
and  to  facilitate  regression  testing. 

Notations  and  practices  for  specifying  and  using  testing  requirements  must  be  developed.  Althoi^  the 
requirements  will  guide  the  use  of  exis&ig  technology,  identifying  what  information  should  be  captured, 
how  it  should  be  represented,  and  how  used  is  a  difficult  problem.  For  example,  one  of  the  simplest  ways 
(though  in  practice  insufficient  by  itself)  to  specify  the  minimum  level  of  dynamic  analysis  for  a  piece  of 
sequential  software  is  to  provide  structural  coverage  measures.  Different  types  of  coverage  measures  are 
appropriate  for  unit,  integration,  and  system  testing.  Whereas  coverage  for  unit  testing  is  typically 
assessed  based  on  the  control  and  data  elements  exercised,  the  proportion  of  program  unit  invocations 
exercised,  or  the  proportion  of  possible  sequences  in  which  they  are  invoked,  is  a  better  measure  for 
integration  testing.  A  further  degree  of  abstraction  is  appropriate  for  assessing  the  coverage  of  system 
testing.  Here  coverage  measures  reflecting  the  system  lections  exercised  should  be  used,  where  func¬ 
tions  are  ranked  to  reflect  their  criticality  to  the  system  mission  and  the  possible  severity  of  the  conse¬ 
quences  of  their  failure. 

Even  if  simple  coverage  measures  were  a  sufficient  means  for  specifying  the  required  degree  of  dynamic 
analysis,  hierarchies  of  coverage  measures  balancing  the  extent  of  required  testing  against  the  criticality  of 
the  software  under  test  and  the  cost  of  achieving  different  levels  of  coverage  would  be  needed.  There  are 
hierarchies  which  can  be  exploited  for  this  purpose,  but  there  is  little  experience  to  support  mapping  lev¬ 
els  of  criticality  to  levels  of  required  coverage.  What  are  the  factors  that  determine  criticality,  what 
discriminates  between  different  degrees  of  criticality,  and  how  should  criticality  be  stated?  Having  deter¬ 
mined  the  coverage  required,  which  dynamic  analysis  techniques  should  be  employed,  how  extensive 
should  the  test  data  used  for  each  be? 

A  related  issue  concerns  integrating  the  results  of  testing  and  evaluation  into  different  stages  of 
development,  and  using  different  techniques,  to  determine  the  sufficiency  of  completed  testing  and 
evaluation  and  to  provide  an  overall  view  of  the  software  status.  For  example,  in  the  case  of  measuring 
software  properties,  the  use  of  a  common,  underlyii^  base  set  of  metrics  is  vital  since  trade-offs  between 
software  properties  are  inevitable  and  the  overall  characteristics  of  the  system  will  be  largely  determined 
from  evaluation  of  individual  subsystems  and  components.  How  should  testing  requirements  and  results 
be  captured  as  permanent  attributes  of  a  program,  supported  by  sufficient  details  to  repeat  the  testing  at 
need?  How  can  the  flexibility  necessary  to  allow  evolution  of  the  testing  practices  embedded  in  the 
development  process  be  provided?  These,  and  other,  questions  must  be  addressed  before  the  testing 
requirements  mechanism  can  be  institutionalized. 

8.2  SDS  Software  Data  Collection  System 

It  is  extremely  important  that  a  program-wide  SDS  software  data  collection  system  (SSDCS)  be  esta¬ 
blished.  The  SSDCS  should  be  similar  in  nature  to  the  measurement  and  collection  capabilities  of  the 
NASA  SEL.  As  such,  it  will  support  analysis  of  not  only  SDS  software,  but  serve  as  a  valuable  resource 
for  the  better  understanding  and  advancement  of  the  technology  used  to  develop,  test,  and  support  that 
software. 

The  SSDCS  will  provide  the  focal  point  for  investigating  the  composition,  effectiveness,  and 
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applicability  of  the  SDS  software  measurement  program.  An  overall  SDS  measurement  strategy  must  be 
developed  viiich  provides  proper  support  for  the  understanding,  evaluation,  and  control  of  SDS 
software.  This  strategy  must  include  a  validated  set  of  metrics  collected  within  a  measurement  methodol¬ 
ogy  which  is  integrated  into  development  activities.  In  addition,  the  measurement  strategy  must  be  con¬ 
tinuously  monitored  for  effectiveness  and  applicability.  The  SSDCS  will  perform  experiments  and 
demonstrations  to  develop,  tailor,  and  validate  those  elements  of  the  measurement  strategy. 

Although  primarily  intended  to  improve  understanding  of  existing  technology,  data  which  supports  the 
advancement  of  technology  should  also  be  collected.  An  initial  set  of  technology  questions,  or  goals, 
which  need  to  be  addressed  is  shown  in  Hgure  8-2.  One  of  the  lessons  learned  in  metrics  research  is  to 
collect  only  the  data  that  are  needed  for  a  specific  purpose.  A  large  volume  of  data  often  leads  to 
superficial  analysis,  which  does  not  contribute  to  understanding.  Consequently,  the  data  required  to 
achieve  the  technology-related  goals  of  the  SSDCS  must  be  clearly  defined,  and  the  costs  associated  with 
collecting  and  analyzing  that  data  weighted  against  the  potential  benefits.  While  a  part  of  the  data  collec¬ 
tion  will,  of  necessity,  require  manual  procedures,  development  and  testing  environments  must  be  instru¬ 
mented  to  automate  data  collection  to  the  maximntn  extent  possible.  Instnunentation  requirements  for 
these  environments  must  be  defined.  The  specific  programmatic  support  needed  to  establish  and  maintain 
an  SSDCS  must  also  be  determined;  including  consideration  of  the  diverse  topics  of  organizational  roles 
and  responsibilities,  contractual  implications,  and  policy  directives. 

8.3  Automated  Testing  and  Evaluation  Environment 

As  the  field  of  testing  and  evaluation  technology  matures,  automated  support  is  no  longer  merely  desir¬ 
able,  but  increasingly  a  prerequisite  for  the  application  of  today’s  sophisticated  techniques.  Software 
developers  need  access  to  a  collection  of  useful  testing  and  evaluation  tools  with  the  capability  to  build  an 
evolving  picture  of  the  status  of  the  software  under  test. 

The  provision  of  a  comprehensive  testing  environment  which  supports  all  aspects  of  preparing  for,  exe¬ 
cuting,  and  reporting  on  testing  and  evaluation  activities  is  a  high-priority  goal.  Requirements  for  the  test¬ 
ing  environment  must  be  determined.  Issues  of  technique  integration,  generic  components,  incremental 
support,  lai^i^e  independence,  user  interaction  models,  and  environment  support  have  been  identified 
previously.  Another  important  concern  is  the  relationship  between  the  different  forms  of  testing  and 
evaluation  technology.  For  example,  dynamic  analysis  approaches  traditionally  employ  inductive 
methods,  whereas  formal  verification  employ  deductive  methods.  This  distinction  is  narrowing  as  more 
dynamic  analysis  techniques  use  deductive  methods  to  identify  the  test  data  necessary  to  execute  selected 
paths.  Thus,  the  symbolic  evaluation  that  is  the  firont-end  of  formal  verification  is  becoming  a  crucial  ele¬ 
ment  of  many  analysis  techniques.  Similarly,  some  testing  and  evaluation  activities  can  borrow  from  com¬ 
piler  technology,  or  even  be  provided  through  compiler  extensions.  Test  management  should  be  embed¬ 
ded  in  the  environment,  leading  to  proactive  tools  which  guide  the  user  in  the  application  of  appropriate 
techniques  for  the  case  in  hand.  (Process  programming  should  be  investigated  as  a  mechanism  for  achiev¬ 
ing  this  and  earlier  stated  goals.)  Finally,  the  development  of  efficient  testing  algorithms  which  can  exploit 
the  capabilities  of  supercomputers  to  facilitate  the  use  of  computationally  intensive  testing  techniques 
should  also  be  examined. 

Development  of  such  an  environment  is  a  significant  undertaking,  and  the  possibility  of  building  on 


9.  Note:  These  objectives  were  first  identified  by  Miller  in  late  1970’s  [Mill79aj,  though  here  are  slightly  modified. 
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Djaamlc  and  Static  Analysis  Objcctires: 

1.  Develop  a  series  of  weU-understood  weighting  of  programs  that  distinguish  them  in  terms 
of  thdr  analysis  difficulty.’ 

2.  Collect  data  to  confirm/refine  capability  profiles  on  techniques  and  tools . 

Formal  Verification  ObJectlTes: 

1.  Determine  model  parameters  for  estimating  costs  and  schedules  for  development  of  for¬ 
mally  verified  software  components. 

Measoremcnt  Objectives: 

1.  Support  development  of  tailorable  metrics  for  specific  application  domains. 

2.  Build  metric-based  models  of  the  development  processes  and  products  and  use  them  for 
improvement. 

RellaLillty  Assessment  Objectives: 

1.  Support  development  of  a  model  that  predicts  reliability  from  the  characteristics  of  the 
software  development  and  testing  process.* 

General: 

1.  Gather  significant  experience  with  testing  and  evaluation  of  large  systems  that  reveals 
empirical  principles  which  can  minimize  its  cost,  or  increase  its  effectiveness  at  the  same 
cost.* 

2.  Develop  a  psychology  of  testing  and  evaluation  which  aids  in  designing  and  imple¬ 
menting  organizations  for  its  effective  performance.* 

3.  Evaluate  the  cost  and  benefits  of  the  technology  introduced. 

Figure  8-2.  Technology  Objectives  for  the  SSDCS 

existing  efforts  should  be  carefully  investigated.  The  following  is  a  list  of  only  a  few  of  the  current  efforts 
that  are  of  particular  interest  in  this  respect: 

•  Dynamic  and  Static  Analysis: 

—  The  Arcadia  and  TEAM  environments,  and 
—  The  system  development  environment  from  Stanford  University. 

’  •  Formal  Verification: 

—  The  Annotated  Verifiable  Ada  (AVA)  system  being  developed  by  Computa¬ 
tional  Logic,  Inc. 

—  The  Ulysses  project  at  Odyssey  Research  Associates. 

•  Measurement: 

—  The  Software  Metrics  Data  Collection  System, 
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—  Basili’s  aad  Rombach’s  goal-oriented  approach,  and 

—  AdaMAT. 

•  Reliability  Assessment  (the  measurement  efforts  noted  above  also  apply  here): 

—  The  RADC  Software  Reliability  Measurement  Framework, 

—  Revision  of  MIL-STD-785B  (Reliability  Program  for  Systems  and  Equipment 
Development  and  Production),  by  EIA  Committee  on  System  Reliability 
(G41),  Subcommittee  on  Software  Reliability,  and 

—  IEEE  STDs  982.1  (Dictionary  of  Measures  to  Produce  Reliable  Software)  and 
982.2  (Guide  for  the  Use  of  Measures  to  Produce  Reliable  Software). 

While  the  testing  environment  must  be  available  prior  to  full  scale  development  of  SDS  software,  its 
availability  is  unhkely  to  significantly  precede  the  full  scale  engineering  phase.  In  the  interim,  a  primitive 
environment  should  be  assembled  from  available  tools.  The  primary  goals  of  this  interim  environment 
are  to  (1)  provide  immediate  support  to  ongoing  SDS  efforts,  (2)  start  the  technology  transfer  process,  (3) 
support  collection  of  quantitative  information  on  the  capabilities  of  current  techniques  and  tools,  and  (4) 
early  ^erimentation  with  processes  aimed  at  effective  integration  of  testing  and  evaluation  in  the 
development  life  cycle.  Although  unable  to  provide  the  efficient  and  effective  testing  and  evaluation 
expected  for  a  carefully  defined  environment,  such  a  collection  of  tools  offers  a  significant  improvement 
over  current  practices. 

Examples  of  a  few  candidate  tools  (taken  from  those  mentioned  in  this  report)  for  inclusion  in  such  an 
interim  environment  are: 

•  Dynamic  and  Static  Analysis: 

—  The  AdaPIC  toolset,  and 

—  The  MOTHRA  mutation  testing  system. 

•  Formal  Verification: 

—  The  Gypsy  Verification  Environment. 

•  Measurement  and  Reliability  Assessment: 

—  The  TAME  Environment. 

—  The  Ada  Test  and  Verification  System  (ATVS). 

These  example  candidates  have  been  selected  on  the  basis  of  (1)  support  for  testing  and  evaluation  of  Ada 
code,  and  (2)  the  possibility  that  they  require  less  than  6  man-months  of  effort  to  reach  at  least  advanced 
prototype  status.  Additional  candidates  must  be  identified  and  all  evaluated  with  respect  to  SDS  software 
testing  and  evaluation  needs.  The  cost  to  apply  the  tools  and  provide  them  routinely  to  software  develop¬ 
ers  on  a  Government  Furnished  Equipment  (GFE)  basis  should  also  be  investigated.  A  query  to  research¬ 
ers  in  the  different  areas  of  testing  and  evaluation  to  identify  additional  candidates  has  already  been  ini¬ 
tiated. 

8.4  Process  Modeling 

Testing  and  evaluation  processes  must  be  well-specified.  This  is  necessary  to  allow  such  benefits  as 
universally  understood  testing  and  evaluation  practices  and  meaningful  monitoring  of  testing  and 
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evaluation  activities.  It  is  also  necessary  to  facilitate  the  adoption  of  testing  and  evaluation  practices 
which  can  opand  and  grow  with  technology  and  to  further  the  understanding  of  that  technology.  The 
tasks  to  exploit  technology  just  discussed  exemplify  this  need.  Indeed,  to  be  effective,  they  require  these 
capabilities  as  a  necessary  precursor. 

Process  modeling,  in  particular  process  programming,  should  be  investigated  as  a  mechanism  for 
achieving  these  goals.  Existing  techniques  and  experience  in  process  programming,  such  as  those  gained 
on  the  Arcadia  project,  should  be  exploited  for  this  purpose.  An  activity  to  define  SDS  testing  and  evalua¬ 
tion  processes  should  be  undertaken.  An  additional  activity  to  investigate  the  specification  of  an  effective, 
flmdble  SDS  software  development  model  that  fully  integrates  testing  and  evaluation  activities  is  also 
recommended. 
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9.  EXTENDING  THE  BOUNDARIES  OF  TECHNOLOGY 

The  tasks  in  Section  8  provide  a  framework  for  bringing  existing  technology  into  SDS  practice  and  pro¬ 
moting  a  better  understanding  of  the  basic  interrelationships  between  development  and  testing  and 
evaluation  activities.  This  approach  to  carefully  considered  innovation  holds  great  promise.  Even  so, 
current  gaps  in  testii^  and  evaluation  technology  preclude  confident  deployment  of  a  reliable  SDS.  Fun¬ 
damental  research  to  resolve  critical  deficiencies  is  urgently  required.  In  particular,  technology  for  testing 
and  evaluation  of  large,  concurrent  and  real-time  software  is  largely  nonodstent  for  practical  purposes. 

Here  again,  three  major  tasks,  or  more  properly  task  areas,  are  recommended.  First,  the  SDS  faces 
specific  technical  problems  which  require  practical  solutions  in  the  relatively  near-term.  A  series  of  tech¬ 
nology  demonstrations  to  investigate  the  capabilities  of  emerging  technology  to  solve  these  problems  is 
proposed.  The  second  task  area  presents  a  number  of  areas  where  fundamental  R&D  is  needed  to  address 
shortfalls  in  technology.  Finally,  a  series  of  tasks  to  monitor  ongoing  testing  and  evaluation  research 
efforts  is  recommended.  All  these  task  areas  concentrate  on  well-focused  research  tasks  to  be  conducted 
over  the  next  5  years.  It  is  expected  that  these  tasks  will  lead  to  the  recognition  of  additional  R&D  efforts 
which  should  be  supported  over  a  much  longer  timeframe. 

Unfortunately,  the  software  testing  and  evaluation  research  community  is  too  small  and  too  weak  at 
present  to  rise  to  the  challenges  of  SDS  software  testing  and  evaluation.  The  community  must  be 
strengthened  and  expanded  as  quickly  as  possible.  This  report  has  identified  the  need  for  expanded 
research,  development,  technology  transfer  and  productization.  Ail  these  require  significant  infusion  of 
resources.  More  the  just  money  is  needed,  however.  If  contracts  were  let  to  perform  all  of  the  work  that  is 
needed,  there  are  not  enough  researchers  in  a  position  to  perform  the  contracts.  The  SDIO  should  con¬ 
sider  taking  the  lead  in  encouraging  other  DOD  ^encies  to  join  with  them  in  building  up  the  testing  and 
evaluation  research  community  to  attack  the  critical  problems  surrounding  highly  reliable  software. 

9.1  Technology  Demonstrations 

There  are  a  number  of  areas  where  the  practical  use  of  emerging  technology  could  provide  increased 
understanding  of  that  technology  to  facilitate  its  advancement.  Similarly,  when  researchers  and  software 
developers  are  required  to  address  specific  problems  in  a  practical  arena,  they  are  likely  to  gain  increased 
understanding  of  the  problem  which,  in  turn,  provides  valuable  insights  into  possible  solutions,  or  sup¬ 
ports  the  development  of  a  working  solution  pending  necessary  theoretical  advances. 

A  series  of  technology  demonstrations  which  require  solutions  to  specific  technical  problems  is  recom¬ 
mended.  This  proposal  is  in  keeping  with  the  planned  development  of  SDS;  current  activities,  as  a  whole, 
are  either  technology  demonstrations  or  experimental  developments.  Testing  and  evaluation  technology 
demonstrations  should  be  conducted  on  recognizable  components  of  the  SDS  with  a  view  to  potentially 
providing  practically  useful  products.  There  are  several  goals  which  should  be  applied  to  all  demonstra¬ 
tions,  such  as  requiring  increased  use  of  formalism  throi^out  the  life  cycle. 

An  initial  list  of  candidate  problems  to  be  investigated  is  given  in  F^ure  9-1.  In  each  case,  the  specifics 
of  a  suitable  technology  demonstration  must  be  determined  so  that  a  decision  to  pursue  a  demonstration 
can  be  firmly  based  on  projected  costs  and  benefits.  Not  all  of  the  problems  must  be  addressed  individu¬ 
ally;  the  ability  to  define  demonstrations  which  tackle  a  combination  of  problems  must  be  investigated. 
The  possibility  of  exploiting  current  software  efforts  (both  SDS  and  other  DOD  efforts)  to  provide  vehi¬ 
cles  for  these  demonstrations  should  be  considered. 


87 

UNCLASSinED 


UNCLASSIFIED 


9.2  Research  and  Development  Tasks 

Resolution  of  the  major  gaps  in  testing  and  evaluation  technology  requires  fundamental  advances  in  the 
underlyii^  concepts.  While  the  SDIO  should  not  be  requiring  research  simply  for  research’s  sake,  some 
pure  R&D  efforts  are  necessary  to  find  answers  to  SDS  software  testing  and  evaluation  problems. 

Brief  descriptions  of  an  initial  set  of  recommended  R&D  tasks  are  ^ven  below. 

•  Dynamic  and  Static  Analyria  Problema: 

—  Develop  techniques  for  designing  testable  software  and  measuring  achieved  degree  of  testability. 

—  Develop/investigate  specification  languages  and  techniques  which  can  provide  an  oracle  capability  to  support 
the  dynamic  analysis  of  products  from  later  development  activities. 

—  Develop  a  hierarchy  of  coverage  measures  which  map  against  levels  of  criticality  in  SDS  software. 

—  Perform  a  fault  tree  analysis  of  a  sample  SDS  element  and  use  to  determine  how  to  specify  impact  on  testing 
requirements  and  the  need  for  fault'tolerance,  fail-soft,  and  fail-safe  mechanisms  and  procedures. 

—  Identify  SDS  software  elements  which  require  permanent  runtime  self-test,  develop  methods  for  specifying 
self-test  requirements  and  techniques  for  validating  achievement  of  these  requirements. 

•  Formal  Verification  Problems: 

—  Verified  Ada  run-time  support  systems. 

—  Verified  secure  communications  over  noisy  channels. 

—  Verified  distributed  and  autonomous  systems. 

•  Measurement  and  Reliability  Assessment  Problems: 

—  Development  of  a  Comprehensive  Measurement  Methodology. 

—  Integration  of  Measurement  into  Software  Development. 

—  Automated  Tool  Support. 

Figure  9-1.  Candidate  Problems  to  be  Addressed  in  Technology  Demonstrations 


9.2.1  General  R&D  Tasks 

1.  Increased  Formalism  for  Eariy  Lifecycle  Products.  Current  specification  technology  does  not  support 
timely  feedback  to  development  activities  or  early  identification  of  errors.  The  SDIO  has  recognized  this 
problem  and  provided  one  step  forward  by  requiring  use  of  an  architectural  design  language  (SADMT) 
which  supports  design-to-simulation  capabilities.  This  report  itself  has  identified  the  need  for  formal  test¬ 
ing  requirements.  Formal  languages  for  specifying  both  system  and  software  requirements  and  designs  are 
also  needed.  Existing  languages  should  be  investigated  to  identify  a  minimal  subset  which  can  be  recom¬ 
mended  for  use  on  SDS  efforts. 

Existing  testing  and  evaluation  technology  that  can  be  applied  to  these  pre-implementation  descriptions 
should  be  identified.  It  is,  however,  likely  that  new  techniques  and  tools  geared  towards  these  more 
abstract  descriptions  will  be  required.  Although  it  is  doubtful  that  a  single,  say,  system  requirements 
language  will  be  sufficient  for  the  needs  of  all  the  diverse  types  of  SDS  elements,  a  small  common  subset 
of  system  requirements  languages  is  desirable  so  that  researchers  and  software  developers  can  focus  on 
supporting  only  a  few  languages.  The  special  needs  for  each  type  of  language  should  be  identified. 
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2.  Tcstliif  and  Evalnation  Process  Programming.  Process  programming  [OsteSTa]  involves  encoding 
software  development  processes  in  a  rigorous  language,  such  as  a  programming  language.  Such  formal¬ 
ized  specification  of  testing  and  evaluation  processes  offers  several  benefits.  For  example,  when  the 
necessary  interactions  between  software  developers  and  testing  and  evaluation  tools  are  captured,  it 
becomes  possible  for  a  testing  environment  to  explicitly  guide  testing  activities,  rather  than  merely 
respond  to  the  commands  given  by  a  software  developer.  Moreover,  the  act  of  writing  process  programs 
promotes  a  better  understanding  of  the  underlying  activities  and  allows  this  understanding  to  be  widely 
disseminated.  WhQe  high-level  testing  and  evaluation  process  programs  can  be  independent  of  particular 
software  efforts,  at  a  more  detailed  level  they  must  usually  be  tailored  to  a  specific  effort  or  testing 
environment.  Indeed,  the  maximum  benefits  from  testing  and  evaluation  process  programs  will  only  be 
obtained  when  a  testing  environment  is  designed  to  exploit  this  capability,  and  vice  versa.  Consequently, 
an  activity  to  define  and  develop  process  programs  to  support  SDS  software  testing  and  evaluation  activi¬ 
ties  should  proceed  in  parallel  with  the  development  of  an  SDS  testing  environment  (see  Section  8.3). 
Over  the  last  few  years,  the  researchers  undertaking  the  development  of  the  Arcadia  and  TEAM  environ¬ 
ments  have  gained  much  experience  in  process  programming  wUch  could  be  used  to  facilitate  this  task. 

3.  Regression  Testing.  A  relatively  small  task  is  needed  to  address  the  issue  of  regression  testing.  As  pre¬ 
viously  discussed,  the  SDS  will  be  subject  to  continually  changing  requirements  and  operating  environ¬ 
ments.  Thus,  considerable  effort  will  be  invested  in  retesting  and  reevaluating  the  software.  Efforts  to 
facilitate  this  activity,  and  reduce  its  scope,  offer  potentially  enormous  payoffs.  One  of  the  most  important 
issues  is  that  of  sensitivity  focus',  from  the  earliest  stages,  all  products  should  be  analyzed  to  determine 
whether  the  regression  testing  required  by  a  change  to  the  software  is  proportional  to  the  scope  of  that 
change.  Additional  issues  particularly  pertinent  to  regression  testing  include  traceability  of  system 
requirements  through  to  testing  and  evaluation  objects/activities  and  the  recording  of  testing  and  evalua¬ 
tion  histories. 

9.2.2  Dynamic  and  Static  Anaiysis  R&D  Tasks 

1.  Techniques  and  Tools  for  Analysis  of  Concurrent  and  Real-Time  Software.  There  are  large  research 
areas  of  critical  importance  to  SDS  testing  and  evaluation  which  are  still  largely  unaddressed.  SDS 
software  will  be  highly  concurrent  and  will  have  a  strong  real-time  orientation.  Research  into  testing  and 
analysis  of  such  software  is  barely  beginning.  There  are  a  small  number  of  early  research  efforts  under¬ 
way.  The  limitations  of  these  efforts  are  well  recognized  and  the  need  to  strengthen  them  and  augment 
them  with  others  are  also  well  known.  Major  new  research  efforts  are  required  to  develop  the  technology 
needed  for  adequate  testing  and  analysis  of  this  type  of  software. 

2.  SDS  Software  Validation  Suites.  It  is  extremely  desirable  that  a  base  set  of  tests  that  are  applicable  to 
each  component  of  SDS  software  be  developed.  Such  a  validation  suite  would  play  a  valuable  role  in 
achieving  a  known  level  of  software  assurance  across  ail  SDS  software,  and  comprise  a  major  element  of 
acceptance  testing.  In  many  respects,  this  validation  suite  would  be  similar  to  the  Ada  Compiler  Valida¬ 
tion  Capability  (ACVC)  established  by  the  Ada  Joint  Program  Office.  It  could  be  made  available  to 
software  developers  and  all  software  required  to  pass  the  tests  as  a  measure  of  readiness  for  acceptance 
testing.  As  faults  are  identified  in  operational  software,  the  validation  suite  would  be  extended  with  tests 
which  could  detect  these  faults  prior  to  deployment.  Substantial  resources  will  be  required  to  develop  and 
apply  the  validation  suite.  It  will  be  a  major  technical  challenge  to  determine  the  requirements  for, 
design,  develop,  and  maintain  an  evolving  validation  suite  which  provides  both  a  good  level  of  software 
assurance  and  efficient  utilization  of  computing  resources.  The  programmatic  issues  concerned  with 
assigning  responsibilities  and  allocating  necessary  resources  for  developing,  maintaining,  and  applying  the 
validation  suite  must  also  be  investigated. 
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Validation  suites  can  serve  additional  roles.  For  example,  the  issue  of  correctness  of  run-time  kernels  is 
crucial  to  Ada  software  but  unlikely  to  be  fully  resolved  within  the  next  five  years.  Meanwhile,  validation 
suites  for  Ada  tasking  programs  supported  by  specifications  of  the  expected  nm-time  behavior  could  be 
developed.  When  a  specified  program  is  run  with  a  particular  kernel  and  its  behavior  fails  to  conform  to 
the  specification,  this  will  serve  as  an  indication  of  some  problem  with  the  kernel,  compiler  or  such.  This 
is  a  non-trivial  task  since  the  run-time  analysis  will  generally  require  the  use  of  formal  specifications.  Such 
specifications,  however,  if  carefully  planned,  will  ultimately  be  useful  in  the  formal  verification  of  the 
software  under  examination.  This  type  of  role  for  validation  suites  has  many  applications.  Another  exam¬ 
ple  would  be  a  validation  suite  for  simulators  of  SDS  software. 

3.  Develop  Methods  for  Formally  Specifying  Real-Time,  Distributed,  and  Degraded  Systems  and  Test¬ 
ing  Behavior  Against  Specifications.  The  following  subsection  (Section  9.2.3)  identifies  the  need  for 
methods  reasoning  about  real-time,  distributed  and  degraded  systems.  To  do  this  will  require  break¬ 
throughs  in  three  or  four  separate  technologies,  including  specification  languages,  proof  rules  accom¬ 
panied  by  formal  semantics,  proof  methods,  and  automated  proof  systems.  Meanwhile,  the  path  towards 
formal  verification  can  be  exploited  to  yield  early,  practical  results.  To  this  end,  methods  for  formally 
specifying  the  expected  behavior  of  these  types  of  systems,  and  then  testing  actual  behavior  against  the 
specifications,  should  be  developed. 

4.  Develop  Methods  for  Building  Self-Checking  Software  in  Multiprocessor  Systems.  The  correctness  of 
software  in  multiprocessor  systems  cannot  be  assured  prior  to  deployment  with  current  technology.  Even 
if  this  were  not  the  case,  self-checking  software  that  can  guard  against  post-deployment  corruptions  is  still 
advisory.  Consequently,  seif-checking  software  has  a  potentially  vital  importance  for  SDS  software.  As 
with  the  other  R&D  tasks  listed  here,  some  ear’y  work  in  this  area  is  being  performed.  Much  additional 
effort  is  required,  however,  to  produce  products  which  can  be  applied  to  SDS. 

9.2.3  Formal  Verification  R&D  Tasks 

1.  Identify  Critical  Properties  to  be  Formalized  and  Verified,  Levels  of  Criticality,  Priorities.  Critical 
properties  of  a  system  as  a  whole  must  be  identified  as  early  as  possible  in  the  development  process. 
These  properties  affect  critical  design  decisions  in  partitioning  and  allocating  functional  responsibilities 
within  a  system.  Properties  that  must  be  proven  at  the  system  level  imply  requirements  for  components 
with  proven  properties  and  construction  techniques  that  preserve  those  properties. 

As  a  corollary  to  identifying  critical  properties  and  components,  identification  of  levels  of  criticality 
would  help  in  assessing  verification  requirements,  assigning  priorities  to  development  and  verification 
effort,  and  allocating  assurance  resources  between  testing  and  verification. 

2.  Identify  Critical  Life-Cycle  Components  to  be  Formalized  and  Verified.  Critical  system  components 
that  will  require  verification  also  need  to  be  identified  as  early  as  possible  in  the  development  process. 
Proving  properties  of  components  is  much  easier  when  the  proof  process  is  made  an  integral  part  of  their 
design  and  implementation.  In  fact,  proofs  of  correct  components  may  be  impossible  to  construct  after 
the  fact,  because  of  design  decisions  and  programming  practices  that  increase  the  complexity  of  proofs. 
Early  identification  of  critical  components  can  significantly  reduce  verification  schedules  and  effort. 

3.  Develop  Methods  for  Reasoning  about  Real-Time  Systems.  The  critical  property  to  be  verified  in 
most  real-time  systems  is  that  processes  meet  their  deadlines.  Results  are  affected  by  the  algorithms 
employed,  efficiency  of  object  code  generated  by  the  compiler,  scheduling  policies  and  performance  of 
the  run-time  system,  and  target  hardware  performance.  The  problem,  therefore,  spans  the  design  of  the 
entire  system.  Advances  are  needed  in  several  areas  including  real-time  specification  languages,  formal 
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semantics,  proof  rules  and  methods,  and  automated  support. 

In  the  past  the  deadline  problem  has  been  addressed  by  worst-case  analysis,  which  tends  to  produce 
overly  pessimistic  solutions.  That  is,  systems  are  over  built  for  normal  operations  to  ensure  that  they  can 
handle  the  extreme  worst-case  deadMne  situations.  Specific  techniques  employed  include:  algorithms  with 
fixed  execution  times,  assembly  lai^uage  or  hand-optimized  object  code,  fixed  priorities,  deterministic 
scheduling,  and  target  hardware  upgrades.  Each  of  these  techniques  simplifies  the  deadline  verification 
problem,  but  they  often  restrict  capabilities  that  could  be  supported  under  normal  (slack)  operating  con¬ 
ditions. 

Adaptive  algorithms  and  scheduling  techniques  that  adjust  to  system  workload  have  been  introduced  to 
gain  processing  capabilities  during  slack  periods.  For  example,  when  system  workload  is  light,  slower  but 
more-accurate  algorithms  can  be  used  and  useful  background  operations  can  be  performed.  As  the  work¬ 
load  picks  up,  background  operations  are  dropped  and  required  processes  can  switch  to  faster  algo¬ 
rithms.  In  addition,  as  a  process  nears  its  deadline,  it  may  try  to  increase  its  scheduling  priority  to  assure 
its  completion.  Formal  methods  are  needed  for  reasoning  about  these  techniques  that  would  allow 
development  of  proofs  of  adaptive  program  behavior. 

4.  Develop  Methods  for  Reasoning  about  Distributed  Systems.  Distributed  systems  are  characterized  by 
communication  latencies  between  subsystems.  This  makes  it  extremely  difficult  for  subsystems  to  syn¬ 
chronize  their  actions.  Timing  constraints  on  coordination,  for  example,  may  not  allow  subsystems  to 
fuUy  verify  each  others  actions  or  readiness.  That  is,  some  subsystems  may  have  to  proceed  on  the 
assumption  that  the  other  subsystems  are  performing  their  functions  at  the  right  time.  Formal  verification 
of  such  systems  requires  methods  for  reasoning  about  system  behavior  where  the  timing  of  state  transi¬ 
tions  is  uncertain. 

5.  Develop  Methods  for  Reasoning  about  ‘‘Degraded’*  Systems.  Fault-tolerant  systems  are  able  to 
recover  from  or  adapt  to  certain  types  of  component  failures.  Many  such  systems  may  continue  to 
operate  in  a  degraded  mode  until  the  failed  component  can  be  repaired  or  replaced.  Current  verification 
techniques  assume  correct  operation  of  underljdng  hardware  and  peripheral  devices  such  as  sensors. 
Methods  for  reasoning  about  system  behavior  in  the  presence  of  potential  component  failures  is  needed  to 
verify  fault-tolerant  systems. 

6.  Develop  Support  for  Proving  Attributes  Throughout  System  Development.  Second  generation 
verification  tools  are  improving  the  utility  of  earlier  tools  by  assuring  the  soundness  of  underlying  logic 
systems,  standardizing  on  programming  and  annotation  languages  (Ada  and  Anna),  improving  user  inter¬ 
faces,  and  improving  performance.  Additional  standards  and  production-quality  tools  are  needed  for  for¬ 
mal  requirements  and  design  notations  that  can  be  used  as  a  basis  for  proofs. 

9.2.4  Measurement  Technology  R&D  Tasks 

1.  Develop  a  Comprehensive  Measurement  Methodology.  Current  measurement  methodologies  do  not 
provide  an  adequate  framework  lor  metric  analysis.  Methods  must  be  developed  which  address  the 
requirements  of  the  measurement  process,  guide  the  appropriate  selection  of  metrics,  and  aid  in  collect¬ 
ing,  interpreting,  and  validating  metric  results.  Such  a  measurement  methodology  should  encourage  the 
top-down  generation  of  metrics  based  upon  the  needs  of  the  particular  project/organization  (for  example, 
primary  emphasis  on  reliability  or  portability,  cost  and  time  factors).  The  methodology  should  support 
the  tailoring  of  metrics  based  on  the  specific  needs  of  the  project/organization,  and  the  particular  charac¬ 
teristics  of  the  project  environment. 
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2.  Integration  of  Measurement  into  Software  Dereiopment.  The  fundamental  purpose  of  software  meas-  ^ 

urement  is  the  generation  of  knowledge  and  information  which  will  permit  the  creation  of  higher-quality 
products.  The  measurement  process  must  produce  results  which  are  fed  back  into  the  software  develop¬ 
ment  process  to  support  improvement  and  learning. 

3.  Antomated  Tool  Support.  There  currently  exist  almost  no  metric  tool  support  other  than  those  which 

automate  metric  analysis.  The  integrated  measurement  methodology  previously  mentioned  will  require  ^ 

automation  at  various  levels.  Requirements  determination,  metric  selection,  collection,  interpretation, 

and  validation  will  all  require  some  degree  of  automated  tool  support.  In  addition,  a  historical  database 
should  be  established  which  can  be  used  to  validate  existing  metrics  and  examine  proposed  metrics. 

9.2.5  Software  Reliability  Assessment  R&D  Tasks  ^ 

1.  Assessment  of  the  Software  Development  Process.  To  support  the  construction  of  reliable  software, 
emphasis  must  be  placed  on  the  software  development  process.  That  is,  the  targets  of  software  reliability 
assessment  must  be  software  development  methodologies,  practices,  tools,  techniques,  and  other  ele¬ 
ments  of  the  process,  rather  than  individual  software  products.  The  motivation  here  is  that  the  best  way  to 
construct  reliable  software  is  to  utilize  software  development  methodologies  that  have  been  shown  to 

afford  the  highest  degree  uf  reliability.  This  task  should  be  accomplished  through  the  Measurement  Tech-  * 

nology  R&D  tasks  described  above,  by  ensuring  proper  emphasis  on  metrics  that  capture  the  concepts  of 
reliability.  A  necessary  precondition,  of  course,  is  the  effective  representation  of  development  processes 
through  such  a  medium  as  process  programming. 

2.  Assessment  of  Software  Reliability  in  a  System  Context.  The  technology  must  enable  software  reliabil¬ 
ity  to  be  assessed  in  a  system  context.  In  distributed  real-time  systems,  the  software  is  responsible  for  * 

dealing  with  timing  constraints,  hardware  failures,  and  software  faults.  Accordingly,  software  correctness 

and  reliability  depend  on  whether  software  meets  its  requirements  with  respect  to  real-time  and  fault 
tolerance. 

3.  Assessment  of  System  Reliability.  Since  software  reliability  must  be  taken  into  account  when  assessing 

system  reliability,  the  technology  must  support  system  reliability  assessment.  In  regard  to  this  issue,  the  ^ 

traditional  practice  of  casting  software  reliability  in  hardware  reliability  terms  and  then  using  combina¬ 
torial  analysis  to  derive  system  reliability  needs  to  be  rethought.  In  particular,  the  distinction  between 
design  faults  and  i^e-related  faults  needs  to  be  given  further  consideration. 

9.3  Monitor  Technology  Research  { 

This  report  does  not  identify  any  technology  deficiencies  which  have  not  previously  been  recognized. 
Consequently,  some  of  the  problems  discussed  herein  are  already  being  addressed  by  various  researchers. 

The  SDIO  must  maintain  a  close  awareness  of  these  ongoing  R&D  efforts  so  that  (1)  promising  develop¬ 
ments  are  promptly  considered  for  SDS  practice,  (2)  the  SDIO  helps  to  fund  efforts  which  indicate  solu¬ 
tions  to  specific  SDS  problems,  and  (3)  none  of  the  SDIO-sponsored  research  redundantly  duplicates  * 

other  work. 

An  initial  list  of  ongoing  R&D  efforts  to  monitor  can  be  compiled  from  those  mentioned  in  this  report. 

Mechanisms  for  establishing,  and  maintaining,  contact  with  these  efforts  must  be  developed,  and 
appropriate  responsibilities  assigned.  General  contact  with  the  research  community  as  a  whole  is  neces¬ 
sary  so  future  research  efforts  are  considered  for  inclusion  in  this  task  as  they  arise.  For  example, 
researchers  from  the  University  of  California  (Irvine),  the  University  of  Massachusetts,  and  Purdue 
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University  are  collaborating  in  the  development  of  a  ^  year  research  plan  which  is  a  candidate  for  future 
inclusion  in  the  list  of  R&D  efforts  to  monitor. 
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APPENDK  A:  GLOSSARY  OF  TERMS 


While  attempting  to  define  the  majority  of  terms  used  in  this  paper,  the  reader  is  assumed  to  be  familiar 
with  general  software-related  terms  and,  therefore,  this  glossary  focuses  on  testing  and  evaluation  terms. 
Where  necessary,  the  reader  is  referred  to  the  TF-F.F.  Standard  Glossary  of  Software  Engineering  Termi¬ 
nology,  for  additional  definitions. 

Many  of  the  following  definitions  are  taken  from  the  IEEE  and  other  existing  glossaries.  In  each  such 
case,  the  definition  is  followed  by  a  reference  to  the  relevant  source.  These  sources  are  listed  at  the  end  of 
the  glossary. 


A1  CERTIFICATION:  The  distinguishing  feature 
of  systems  in  this  class  is  the  analysis  derived 
from  formal  design  specifications  and  verification 
techniques  and  the  resulting  high  degree  of 
assurance  that  the  Trusted  Computing  Base  is 
correctly  implemented.  This  assurance  is 
developmental  in  nature,  starting  with  a  formal 
model  of  security  policy  and  a  formal  top-level 
specification  of  the  design. 

ABSTRACTION:  (1)  A  view  of  a  problem  that 
extracts  the  essential  information  relevant  to  a 
particular  purpose  and  ignores  the  remainder  of 
the  information.  (2)  The  process  of  forming  an 
abstraction.  [lEEl^]. 

ABSTRACT  MACHINE:  A  representation  of  the 
characteristics  of  a  process  or  machine.  [IEEE83] 

ABSTRACT  MODEL  SPECIFICATIONS:  Also 
called  the  Predicate  Transform  Method.  For  syn¬ 
tax  it  employs  the  basic  precondition/post¬ 
condition  format  developed  by  Hoare.  It  defines 
functions  in  terms  of  an  underlying  abstraction 
selected  by  the  specifier. 

ACCEPTANCE  TESTING:  Formal  testing  con¬ 
ducted  to  determine  whether  or  not  a  system 
satisfies  its  acceptance  criteria  and  to  enable  a 
customer  to  determine  whether  or  not  to  accept 
the  system.  See  also  QUALIFICATION  TEST¬ 
ING,  SYSTEM  TESTING. 

ACCESSIBILITY:  Code  possesses  the  charac¬ 
teristic  accessibility  to  the  extent  that  it  facilitates 
selective  use  of  its  parts.  [RADC83]. 


ACCURACY:  (1)  Those  attributes  of  the 
software  which  provide  the  required  precision  in 
calculations  and  outputs,  or  (2)  a  measure  of  the 
degree  of  freedom  from  error;  the  degree  of 
exactness  possessed  by  an  approximation  or 
measurement.  [RADC83]. 

ADAPTABILITY:  A  measure  of  the  ease  with 
which  a  program  can  be  altered  to  fit  differing 
user  images  and  system  constraints.  [RADC83]. 

ADAPTIVE  ALGORITHMS:  Multiple  algorithms 
that  are  selected  at  nm-time  depending  on  pro¬ 
gram  conditions  such  as  workload  and  required 
accuracy. 

ADAPTIVE  PROGRAMS:  Programs  that  employ 
adaptive  algorithms. 

ADEQUATE  TEST  DATA:  A  test  data  set  T  is 
adequate  for  a  program  P  it  P  behaves  correctly 
on  T  but  all  incorrect  programs  behave 
incorrectly. 

AFFIRM:  An  automated  verification  system 
developed  at  the  University  of  Southern  Califor¬ 
nia,  Information  Sciences  ^stitute. 

ALGEBRAIC  SPECIFICATION:  An  algebraic 
specification  is  made  up  of  a  list  of  functional 
symbols  (constructors  and  defined  functions)  on 
a  set  of  sorts  and  of  a  set  of  axioms  defining  pro¬ 
perties  of  the  defined  functions. 

ALGEBRAIC  TESTING:  A  testing  approach  in 
which  program  correctness  is  treated  as  a  pro¬ 
gram  equivalence  problem. 
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ALGORITHM:  (1)  A  finite  set  of  well-defined 
rules  for  the  solution  of  a  problem  in  a  finite 
number  of  steps;  for  example,  a  complete 
specification  of  a  sequence  of  arithmetic  opera¬ 
tions  for  evaluating  sin  x  to  a  given  precision.  (2) 
A  finite  set  of  rules  that  gives  a  sequence  of 
operations  for  performing  a  specific  task. 
[IEEE83]. 

ALGORITHMIC  NOTATION:  Use  of  an  algo¬ 
rithm  to  express  a  proof. 

ALPHA  TESTING:  Testing  of  a  software  product 
or  system  conducted  at  the  developer’s  site  by  the 
customer.  See  also  BETA  TESTING. 

ALTERNATE-SUFFICIENT:  As  used  in  Morell’s 
model  of  fault-based  testing,  the  case  where 
either  the  original  program  or  one  of  the  alternate 
programs  must  be  correct. 

ALTERNATIVES:  As  used  in  Morell’s  model  of 
fault-based  testing,  the  set  of  alternative  pro¬ 
grams  derived  by  making  a  series  of  small, 
predefined  changes  to  the  original  program. 

ANNA:  A  langm^e  used  to  annotate  Ada  pro¬ 
grams  by  making  assertions  on  statements,  vari¬ 
ables,  and  program  units  which  can  be 
transformed  into  both  static  and  dynamic  checks 
for  certain  types  of  faults  and  failures. 

ANNOTATION  LANGUAGE;  A  language  which 
defines  assertions  that  can  be  used  to  annotate  a 
product  expressed  in  some  other  language, 
thereby  facilitating  either  static  or  dynamic 
checking  of  particular  properties  of  the  annotated 
product. 

ARC:  In  a  directed  graph,  the  oriented  connec¬ 
tion  between  two  nodes.  Also  called  an  edge. 
[MillSl] 

ARCADIA:  A  software  development  environ¬ 
ment  under  development  by  a  consortium  of 
academic  and  commercial  organizations.  The 
principal  characteristics  of  the  Arcadia  design 
revolve  around  the  use  of  process  programming 
and  tool  fragments  to  yield  a  highly  flexible  and 
extensible  architecture. 


ARCHITECTURE:  See  SYSTEM  ARCHITEC¬ 
TURE. 

ARCHITECTURAL  DESIGN:  (1)  The  process  of 
defining  a  collection  of  hardware  and  software 
components  and  their  interfaces  to  establish  a 
framework  for  the  development  of  a  computer 
system.  (2)  The  result  of  the  architectural  design 
process.  [IEEE83] 

ARIES:  The  generic  interpreter  developed  for  the 
TEAM  testing  environment. 

ARITHMETIC  EXPRESSION:  A  formula  which 
defines  the  computation  of  a  value. 

ARRAY:  A  composite  object  consisting  of  com¬ 
ponents  that  have  the  same  type. 

ASSERTION:  A  logical  expression  specifying  a 
program  state  that  must  exist  or  a  set  of  condi¬ 
tions  that  program  variables  must  satisfy  at  a  par¬ 
ticular  point  during  program  execution;  for  exam¬ 
ple,  A  is  positive  and  A  is  greater  than  B.  See  also 
INPUT  ASSERTION,  OUTPUT  ASSERTION. 
[IEEE83] 

ASSIGNMENT  STATEMENT;  An  instruction 
used  to  express  a  sequence  of  operations,  or  used 
to  assign  operands  to  specified  variables,  or  sym¬ 
bols,  or  both.  [IEEE83] 

ASYMPTOTICAL  NORMAL  ESTIMATOR;  An 
estimator  is  called  asymptotical  normal  if  its  dis¬ 
tribution  is  almost  normal  for  sufficiently  large 
sample  sizes. 

AUDIT:  (1)  An  independent  review  for  the  pur¬ 
pose  of  assessing  compliance  with  software 
requirements,  specifications,  baselines,  stan¬ 
dards,  procedures,  instructions,  codes,  and  con¬ 
tractual  and  licensing  agreements.  (2)  An  activity 
to  determine  through  investigation  the  adequacy 
of,  and  adherence  to,  established  procedures, 
instructions,  specifications,  codes,  and  standards 
or  other  applicable  contractual  and  licensing 
requirements,  and  the  effectiveness  of  the  imple¬ 
mentation. 

AUGMENTABILITY:  Those  attributes  of  the 
software  which  provide  for  expansion  of 
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capability  for  functions  and  data.  Code  possesses 
the  cha^teristic  augnientid>i]ity  to  the  extent 
that  it  can  easily  accommodate  expansion  in  com¬ 
ponent  computational  functions  or  data  storage 
requirements.  pl4DCS3]. 

AUTONOMOUS  PROOFS:  In  networks  of 
processes,  an  autonomous  proof  treats  a  process 
like  an  independent  entity  which,  therefore, 
requires  an  independent  specification.  The 
specifications  (but  not  the  code)  of  the  com¬ 
ponent  processes  are  used  in  the  proof. 

AVAILABILITY:  The  probability  that  a  specified 
function  or  capability  can  be  initiated  or  invoked 
when  the  system  is  operated  in  its  intended 
environment  for  a  specified  period  of  time. 
peMiSS] 

AXIOMATIC  CORRECTNESS  PROOF:  A 
proof  that  employs  the  Axiomatic  Method  to  ver¬ 
ify  correctness  of  a  program. 

AXIOMATIC  METHOD:  See  INVARIANT 
ASSERTION  METHOD. 


BASIC  EXECUTION  TIME  MODEL:  Software 
reliability  model  in  which  the  failure  process  is 
assumed  to  be  a  nonhomogeneous  Poisson  pro¬ 
cess  with  linearly  decreasing  failure  intensity. 
[Musa87]. 

BETA  TESTING:  Testing  conducted  at  one  or 
more  customer  sites  by  the  end-user  of  a 
delivered  software  product  or  system.  See  also 
ALPHA  TESTING 

BINDING:  The  assigning  of  a  value  or  referent  to 
an  identifier;  for  example,  the  assigning  of  a  value 
to  a  parameter  or  the  assigning  of  an  absolute 
address,  virtual  address,  or  device  identifier  to  a 
symbolic  address  or  label  in  a  computer  program. 
See  also  DYNAMIC  BINDING,  STATIC  BIND¬ 
ING.  [IEEE83] 

BLACK  BOX  TESTING:  See  FUNCTIONAL 
TESTING. 

BLOCK:  See  PROGRAM  BLOCK. 


BREADTH:  The  breadth  of  a  fault-based  tech¬ 
nique  reflects  the  number  of  potential  faults  con¬ 
sidered.  It  may  be  finite  or  infinite. 

BLOCKED:  A  process  is  blocked  when  it  is  wait¬ 
ing  for  an  event  to  occur  before  execution  can 
proceed. 

BLOCKING  FREEDOM:  When  a  process  can 
never  get  into  a  blocked  state. 

BOTTOM  UP  TESTING  STRATEGY:  A  sys¬ 
tematic  testing  philosophy  that  seeks  to  test  those 
modules  at  the  bottom  of  the  invocation  structure 
first.  [MillSl] 

BOUNDARY  VALUE  ANALYSIS:  A  selection 
technique  in  which  test  data  are  chosen  to  lie 
along  boundaries  of  input  domain  (or  output 
range)  classes,  data  struchu’es,  procedure  param¬ 
eters,  etc.  Choices  often  include  maximum, 
minimum,  and  trivial  values  or  parameters.  This 
technique  is  often  called  stress  testing.  [Adri82] 

BOYER-MOORE  THEOREM  PROVER:  A  tool 
that  mechanizes  proofs  in  a  logical  theory 
developed  by  Boyer  and  Moore.  Primarily  an 
induction  machine,  it  incorporates  many  ad  hoc 
proof  strategies  and  expression  simplifiers. 

BRACKETED  SECTIONS:  Regions  of  text, 
immediately  surrounding  an  input/output  state¬ 
ment  in  which  the  global  invariant  need  not  hold. 

BRANCH  TESTING:  A  test  method  satisfying 
coverage  criteria  that  require  that  for  each  deci¬ 
sion  point  each  possible  branch  be  executed  at 
least  once.  [IEEE83] 

BUG;  See  FAULT. 

BUILT-IN-TEST:  Hardware  with  built-in-test 
capabilities  is  hardware  in  which  diagnostic 
probes  are  designed  into  electronic  components 
to  facilitate  easier  detection  and  investigation  of 
faults. 

BULK  CONSTANT:  The  proportion  of  faults  that 
cause  failures. 
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CALENDAR  TIME:  Chronological  time,  includ¬ 
ing  time  during  which  a  computer  may  not  be  run- 
nu^.  [MusaST] 

CAUSE-EFTECT  GRAPH:  A  combinatorial 
logic  network  representing  causes  (distinct  input 
conditions  or  equivalence  classes  of  input  condi¬ 
tions)  and  effects  (output  conditions  or  a  system 
transformation)  and  the  logical  relations  between 
them. 

CAUSE-EFTECT  GRAPHING:  A  test  data  selec¬ 
tion  technique.  The  input  and  output  domains  are 
partitioned  into  classes  and  analysis  is  performed 
to  determine  which  input  classes  cause  which 
effect.  A  minimal  set  of  inputs  is  chosen  that  will 
cover  the  entire  effect  set.  [Adri82] 

CERTIFICATION:  Acceptance  of  software  by 
an  authorized  agent  usually  after  the  software  has 
been  validated  by  the  agent,  or  after  its  validity 
has  been  demonstrated  to  the  agent.  [Adri82] 

CHANNEL  NAME:  Symbolic  name  assigned  to  a 
communication  channel. 

CLOCK  TIME:  Elapsed  time  from  start  to  end  of 
program  execution,  including  wait  time,  on  a  nm- 
ning  computer.  [Musa87] 

CODE:  (1)  A  set  of  unambiguous  rules  specifying 
the  manner  in  which  data  may  be  represented  in  a 
discrete  form.  (2)  To  represent  data  or  a  com¬ 
puter  program  in  a  symbolic  form  that  can  be 
accepted  by  a  processor.  (3)  To  write  a  routine. 
(4)  Loosely,  one  or  more  computer  programs,  or 
part  of  computer  program.  [IEEE83] 

CODE  AUDITOR:  An  automated  tool  which 
checks  for  conformance  to  prescribed  program¬ 
ming  standards  and  practices. 

CODE  INSPECTIONS:  See  INSPECTIONS. 

COHESION:  A  relative  measure  of  the  strength 
of  relationships  among  the  internal  components 
of  a  module  insofar  as  they  contribute  to  the  vari¬ 
ation  in  assumptions  made  by  the  outside  pro¬ 
gram  concerning  the  role  the  module  plays  in  the 
program.  [RADC83]. 


COINCIDENTAL  CORRECTNESS:  Program 
testing  detects  an  fault  by  discovering  the  effect  of 
that  fault.  It  is  possible,  however,  that  an  fault  on 
an  executed  path  may  not  produce  erroneous 
results  for  some  select^  test  ^ta;  this  is  referred 
to  as  coincidental  correctness. 

COLLATERAL  TESTING:  That  testing  coverage 
which  is  achieved  indirectly,  rather  than  as  a 
direct  object  of  a  testcase  generation  activity. 
[MillSl] 

COMMUNICATING  SEQUENTIAL  PROC¬ 
ESSES  (CSP):  A  technique  in  which  the  syn¬ 
chronization  between  concurrent  processes  is 
explicit  and,  as  a  result,  the  semantics  of  mess^e 
passing  can  be  expressed  formally. 

COMMUNICATION  AXIOM:  Used  in  the  CSP 
proof  to  verify  that  the  assertions  made  in 
sequential  proofs  after  communication  points  are 
valid. 

COMMUNICATION  CHANNELS:  The  logical  or 
physical  means  by  which  data  is  transmitted 
between  devices  and/or  processes. 

COMMUNICATION  HISTORY:  The  history  of 
communication  events  that  occur  during  the  exe¬ 
cution  of  a  distributed  program. 

COMMUNICATION  SPACE:  Consists  of  those 
symbols,  known  within  the  module,  by  which 
iMormation  can  be  passed  to  or  from  the  module 
from  outside  it.  Communication  space  mechan¬ 
isms  consist  of  formal  parameters,  global  vari¬ 
ables,  and  return  parameters.  [MillSl] 

COMMUNICATION  EVENT  TRACING:  The 
process  of  keeping  a  history  of  the  communica¬ 
tion  events  as  they  occur  in  running  a  distributed 
program. 

COMPETENT  PROGRAMMER  ASSUMP¬ 
TION:  An  assumption  used  in  mutation  testing 
that  competent  programmers  produce  programs, 
which,  if  not  actually  correct,  are  close  to  being 
correct. 

COMPILE-TIME  CHECK:  Checking  performed 
when  a  computer  program  expressed  in  a 
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problem-oriented  language  is  translated  into  the 
assembly  code  or  machine  code  of  a  particular 
computer. 

COMPILER:  A  computer  program  used  to 
translate  a  higher  order  langui^e  program  into  its 
relocatable  or  absolute  machine  code  equivalent. 
Contrast  with  INTERPRETER. 

COMPLEXITY:  (1)  The  degree  of  complication 
of  a  system  or  system  component,  determined  by 
such  factors  as  the  number  and  intricacy  of  inter¬ 
faces  and  conditional  branches,  the  degree  of 
nesting,  the  types  of  data  structures,  and  other 
system  characteristics.  [IEEE83]  (2)  A  charac¬ 
teristic  of  the  software  interface  which  influences 
the  resources  another  system  will  expend  or  com¬ 
mit  while  interactii^  with  the  software. 

COMPONENT:  A  basic  part  of  a  system  or  pro¬ 
gram.  [IEEE83] 

COMPUTATIONAL  FAULT;  An  incorrect  path 
computation,  such  as  an  fault  caused  by  missing 
or  inappropriate  assignment  statements. 

COMPUTATION  TESTING;  A  testing  technique 
which  analyzes  path  computations  and  selects  test 
data  aimed  at  revealing  computation  faults. 

COMPUTATIONALLY  EQUIVALENT:  Two 
programs,  or  portions  of  a  program,  are  said  to 
be  computationally  equivalent  if  they  produce  the 
same  results. 

COMPUTATIONALLY  INTENSIVE:  Software 
where  the  majority  of  processing  occurs  in  com¬ 
puting  required  functions  rather  than  handling 
inputs  and  outputs. 

COMPUTATIONAL  INDUCTION:  An  inductive 
method  for  proving  things  about  recursively 
defined  functions. 

COMPUTER  SYSTEM:  A  functional  unit,  con¬ 
sisting  of  one  or  more  computers  and  associated 
software,  that  uses  common  storage  for  all  or  part 
of  a  program  and  also  for  all  or  part  of  the  data 
necessary  for  the  execution  of  the  program;  exe¬ 
cutes  user-written  or  user-designed  programs; 
performs  user-designated  data  manipulation. 


including  arithmetic  operations  and  logic  opera¬ 
tions;  and  that  can  execute  programs  that  modify 
themselves  during  their  execution.  A  computer 
system  may  be  a  standalone  unit  or  may  consist  of 
several  interconnected  units.  Synonymous  with 
ADP  system,  computing  system.  [TFF.FJQ] 

CONCURRENCY  HISTORY:  The  sequence  of 
concurrency  states  beginning  with  the  initial  state 
of  a  concurrent  system.  A  proper  history  is  a 
finite  history  in  which  all  elements  are  unique, 
save  possibly  the  final  element  of  the  sequence.  A 
complete  history  of  a  program  S  is  the  set  of  all 
proper  histories  of  S. 

CONCURRENCY  STATE  COVERAGE; 
Member  of  a  series  of  successively  more  stringent 
testing  coverage  measures  analogous  to  structural 
and  data  flow  testing  criteria  for  sequential  pro¬ 
grams.  See  also  STATE  TRANSITION  COVER¬ 
AGE,  SYNCHRONIZATION  COVERAGE. 

CONCURRENCY  STATE  GRAPH;  A  graphical 
representation  of  a  complete  concurrency  history 
where  each  node  represents  a  unique  con¬ 
currency  state  and  each  edge  represents  a  transi¬ 
tion  from  one  concurrency  state  to  another. 

CONCURRENCY  STATE:  A  concurrency  state 
summarized  the  control  state  of  each  of  the  con¬ 
current  processes  at  some  point  in  an  execution, 
including  synchronization  information,  while 
omitting  other  information  such  as  data  values. 

CONCURRENT  PROCESSES;  Processes  that 
may  execute  in  parallel  on  multiple  processors  or 
asynchronously  on  a  single  processor.  Concurrent 
processes  may  interact  with  each  other,  and  one 
process  may  suspend  execution  pending  receipt 
of  information  from  another  process  or  the 
occurrence  of  an  external  event.  [IEEES3] 

CONFIGURATION;  (1)  The  collection  of  inter¬ 
connected  objects  which  make  up  a  system  or 
subsystem.  (2)  The  total  software  modules  in  a 
software  system  or  hardware  devices  in  a 
hardware  system  and  their  interrelationships. 
[DACS79] 

CONSISTENCY ;  Those  attributes  of  the  software 
which  provide  for  uniform  design  and 
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implementation  techniques  and  notation. 
[RADC83]. 

CONSISTENT  ESTIMATOR:  An  estimator  is 
said  to  be  consistent  if  its  variance  tends  to  zero 
and  its  expectation  tends  to  the  true  population 
parameter  as  the  sample  size  tends  to  infinity. 

CONTROL  FLOW:  The  sequence  of  operations 
performed  in  the  execution  of  an  algorithm. 
[TF.F.E83]. 

CONTROL  PATH:  The  sequence  of  control 
statements  that  affect  the  execution  of  a  particu¬ 
lar  program  path. 

CONSTRUCTIVE  ASSERTION:  An  assertion 
used  by  the  constructive  program  verification 
method. 

CONSTRUCTIVE  PROGRAM  VEMHCA- 
HON:  Correctness  proofs  are  established  con¬ 
structively  by  interweaving  the  generation  of  pro¬ 
gram  statements  and  their  accompanying  asser¬ 
tions  with  proof  justifications. 

CONTROL  FLOW:  The  sequence  of  operations 
performed  in  the  execution  of  an  algorithm. 
[IEEE83] 

CONTROL  LOCATION  PREDICATES:  A  set  of 
axioms  which  state  how  control  behaves  in  a  con¬ 
struct. 

CONTROL  STRUCTURE:  (1)  A  construct  that 
determines  the  flow  of  control  through  a  com¬ 
puter  program.  [I£EE83].  (2)  The  sequence  of 
control  constructs  performed  in  the  execution  of 
a  program. 

COOPERATION  PROOF:  A  proof  in  which 
processes  cooperate.  That  is,  the  process  interac¬ 
tions  maintain  the  global  assertion  and  all  local 
assertions  which  are  made. 

CORRECTNESS;  See  PROGRAM  CORRECT¬ 
NESS. 

CORRECTNESS  PROOF:  See  PROOF  OF 
CORRECTNESS. 


CORRECTNESS  SPECIFICATIONS:  In  the 
constructive  method,  a  sublanguage  which  for¬ 
mally  describes  the  specifications  of  the  program. 

CORRELATION  PRINCIPLE:  There  exists  a 
narrow  correlation  between  specification  and 
implementation  structure. 

COUPLING:  A  measure  of  the  strength  of  data 
interconnection  among  modules. 

COUPLING  EFFECT:  An  assumption  used  in 
mutation  testing  which  states  that  test  data  that 
can  distinguish  between  programs  differing  from 
a  correct  one  by  only  simple  errors  is  so  sensitive 
that  it  also  implicitly  distinguishes  from  more 
complex  errors. 

COVERAGE  ANALYZER:  A  software  tool 
which  determines  and  assesses  measures  associ¬ 
ated  with  the  invocation  of  program  structural 
elements  to  determine  the  adequacy  of  a  test  run. 

COVERAGE  CRITERIA:  UsuaUy  applied  to  cov¬ 
erage  of  a  program’s  logic,  coverage  criteria 
specify  that  each  statement,  branch,  or  path  must 
be  executed  at  least  once  during  program  testing. 

COVERAGE  MEASURE:  See  TESTING  COV¬ 
ERAGE  MEASURE. 

COVERAGE  MONITOR:  See  COVERAGE 
ANALYZER. 

CRITICAL  RANGE:  Metric  values  used  to  clas¬ 
sify  software  into  categories  of  acceptable,  margi¬ 
nal  and  unacceptable.  [IEEE88] 

CRITICAL  SECTION:  A  segment  of  code  to  be 
executed  mutually  exclusively  with  some  other 
segment  of  code  is  called  a  critical  section.  Seg¬ 
ments  of  code  are  required  to  be  executed  mutu- . 
ally  exclusively  if  they  make  competing  uses  of  a 
computer  resource  or  data  item.  [IEEE83] 

CRITICAL  VALUE:  Metric  value  of  a  validated 
metric  which  is  used  to  identify  software  which 
has  unacceptable  quality.  [IEEE88] 

CROSS-REFERENCER:  (1)  A  computer  pro¬ 
gram  that  provides  cross-reference  information 
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on  system  components.  For  example,  programs 
can  be  cross-referenced  with  other  programs, 
macros,  parameter  names,  etc.  This  capability  is 
useful  in  problem-solving  and  testing  to  assess 
impact  of  changes  to  one  area  or  another.  (2)  A 
utility  program  which  provides  cross-reference 
data  concerning  a  program  written  in  a  higher 
level  language-  These  utility  programs  analyze  a 
source  program  and  provide  as  output  such  data 
as  follows:  1.  Statement  label  cross-index,  2. 
Data  name  cross-index,  3.  Literal  usage  cross¬ 
index,  4.  Inter-subroutine  call  cross-index,  5.  Sta¬ 
tistical  counts  of  statement  types. 

CYCLOMATIC  COMPLEXITY;  The  cyclomatic 
complexity  of  a  program  is  equivalent  to  the 
number  of  decision  statements  plus  1.  [Adri82] 

DATA:  A  representation  of  facts,  concepts,  or 
information  in  a  formalized  maimer  suitable  for 
communication,  interpretation,  or  processing  by 
human  or  automated  means.  [IEEE83] 

DATA  ABSTRACTION:  The  result  of  extracting 
and  retaining  only  the  essential  characteristic  pro¬ 
perties  of  data  by  defining  special  data  types  and 
their  associated  functional  characteristics,  thus 
separating  and  hiding  the  representation  details. 
[IEEE83] 

DATA  FLOW  ANALYSIS:  Consists  of  the  graph¬ 
ical  analysis  of  collections  of  (sequential)  data 
definitions  and  reference  patterns  to  determine 
constraints  that  can  be  placed  on  data  values  at 
various  points  of  executing  the  source  program. 

DATA  FLOW  ANOMALY:  A  sequence  of  the 
events  reference  (r),  definition  (d),  and  of  vari¬ 
ables  in  a  program  that  is  either  erroneous  in 
itself  or  often  symptomatic  of  an  error. 

DATA  FLOW  TESTING:  A  testing  technique 
which  provides  a  set  of  successively  more 
stringent  path  selection  criteria  that  guide  the 
selection  of  test  data  to  examine  the  relationships 
between  variable  definitions  and  variable  uses. 

DATA  STRUCTURE:  A  formalized  representa¬ 
tion  of  the  ordering  and  accessibility  relationships 
among  data  items  without  regard  to  their  actual 


storage  configuration.  [IEEE83] 

DATA  TYPE:  A  class  of  data  characterized  by 
the  members  of  the  class  and  the  operations  that 
can  be  applied  to  them;  for  example,  int^er, 
real,lo^cal.  [IEEE83] 

DATA-ABSTRACnON  IMPLEMENTATION, 
SPECIFICATION,  AND  TESTING  SYSTEM 
(DAISTS):  An  compiler-based  testing  system 
which  supports  testing  the  implementation  of 
abstract  data  types  against  user-defined  a^ebraic 
axiomatic  specifications  of  those  data  types. 

DATA-INTERFACE  ANALYSIS:  A  form  of 
interface  analysis  which  examines  the  transforma¬ 
tions  of  one  type  of  data  into  another  type  based 
on  available  definitions  of  allowable  transforma¬ 
tions. 

DE-EUTROPfflCATION  MODEL:  A  reUability 
model,  based  on  exponential  failure  intensity  in 
terms  of  time,  developed  by  Jelinski  and 
Moranda. 

DEADLOCK:  The  state  in  which  two  or  more 
processes  are  waiting  for  a  resource  that  is  held 
by  the  other. 

DEADNESS  FAULT:  A  fault  which  occurs  when 
part  of  a  concurrent  computation  can  no  longer 
proceed  due  to  a  task  communication  failure. 

DEBUGGER:  A  software  tool  intended  to  assist 
the  user  in  software  fault  localization  and,  poten¬ 
tially,  fault  correction. 

DEBUGGING:  The  process  of  correcting  syntac¬ 
tic  and  logical  faults  detected  during  testing. 
Debugging  shares  with  testing  certain  techniques 
and  strategies,  but  differs  in  its  usual  ad  hoc  appli¬ 
cation  and  local  scope.  [Adri82] 

DECENTRALIZED  (SDS)  ARCHITECTURE:  A 
Strategic  Defense  System  architecture  in  which 
important  battle  management  decisions  are  made 
locally  on  a  platform.  (Note  that  command  and 
control  decisions  may  still  be  made  in  a  global, 
centralized  fashion.) 

DECISION  NODE:  A  node  in  the  program 
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digrqih  which  corresponds  to  a  decision  state¬ 
ment  within  the  program.  [MillSl] 

DECISION  STATEMENT:  A  statement  in  which 
an  evaluation  of  some  predicate  is  made  that 
(potentially)  affects  the  subsequent  execution 
behavior  of  the  module.  [NfillSl] 

DECISION-TO-DECISION  PATH:  See  SEG¬ 
MENT. 

DEDUCTIVE  SYSTEM:  A  deductive  system  is 
composed  of  axioms  and  rules  of  inference  by 
which  valid  statements,  or  theorems,  may  be 
derived  from  the  axioms  and  other  theorems. 

DEGRADED  SYSTEM:  A  degraded  system  is 
one  in  which  some  functionality  has  been  surren¬ 
dered  in  order  to  allow  continued  processing  of 
critical  functions  after  a  failure  has  occurred. 

DENOTATIONAL  MODEL  OF  PROGRAM¬ 
MING  NOTATION:  The  semantics  of  program¬ 
ming  constructs  of  an  abstract  programming 
language  are  defined  by  semantic  valuation  fimc- 
tions. 

DENOTATIONAL  SEMANTIC  DESCRIPTION 
OF  Ada:  A  description  of  Ada  using  a  denota- 
tional  model. 

DEPLOYMENT:  The  operational  employment  of 
a  system  in  its  intended,  target  environment. 

DESIGN:  (1)  The  process  of  defining  the  software 
architecture,  components,  modules,  interfaces, 
test  approach  and  data  for  a  software  system  to 
satisfy  specified  requirements.  (2)  The  results  of 
the  design  process.  [IEEE83] 

DESIGN  SPECIFICATION:  A  specification  that 
documents  the  design  of  a  system  or  system  com¬ 
ponent;  for  example,  a  software  configuration 
item,  lypical  contents  include  system  or  com¬ 
ponent  algorithms,  control  lo^c,  data  structures, 
data  set-use  information,  input/output  formats, 
and  interface  descriptions.  [IEEES3] 

DESIGN-BASED  FUNCTIONAL  TESTING:  The 
application  of  test  data  derived  through  func¬ 
tional  analysis  {see  FUNCTIONAL  TESTING) 


extmided  to  include  design  functions  as  well  as 
requirement  functions.  [Adri82] 

DESK  CHECKING:  The  manual  simulation  of 
program  execution  to  detect  friults  through  step- 
by-step  examination  of  the  source  code  for  faults 
in  logic  or  syntax.  See  also  STATIC  ANALYSIS. 
[IEEE83] 

DETAILED  DESIGN:  (1)  The  process  of  refining 
and  expanding  the  preliminary  des^  to  contain 
more  detailed  descriptions  of  the  processing 
logic,  data  structiu'es,  and  data  definitions,  to  the 
extent  that  the  design  is  sufficiently  complete  to 
be  implemented.  (2)  The  result  of  the  detailed 
design  process.  [IEEE83]. 

DETERMINISM:  The  property  of  a  transforma¬ 
tion  process  that  the  same  outputs  are  always  pro¬ 
duced  for  a  given  set  of  inputs.  [DACS79]. 

DETERMINISTIC  PROGRAMS:  Those  pro¬ 
grams  in  which  control  flow  is  deterministic. 

DEVELOPMENTAL  TEST  AND  EVALUA¬ 
TION:  Test  and  evaluation  that  focuses  on  the 
technological  and  engineering  aspects  of  the  sys¬ 
tem,  or  equipment  items.  [DACS79] 

DIFFERENTIAL  MODEL:  A  reliability  model 
proposed  by  Littlewood  to  account  for  the  possi¬ 
bility  that  some  faults  are  more  likely  to  occur 
than  others. 

DIGRAPH:  Short  name  for  directed  graph. 
[MillSl] 

DIRECTED  GRAPH:  Consists  of  a  set  of  nodes 
interconnected  with  oriented  arcs.  An  arbitrary 
directed  graph  (digraph)  may  have  many  entry 
nodes  and  many  exit  nodes.  A  program  digraph 
has  only  one  entry  and  one  exit.  [MillSl] 

DIRECT  METRIC:  A  metric  that  represents  and 
defines  a  software  quality  factor,  and  which  is 
valid  by  definition  (e.g.,  mean-time  to  software 
fault  of  1000  operating  hours  for  the  factor  relia¬ 
bility).  [ffiEESS] 

DISTRIBUTED  ARCHITECTURE;  A  system 
architecture  in  which  there  is  not  a  global  address 
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space  and  which  is  often  geographically  distri¬ 
buted. 

DISTRIBUTED  PROCESSING  SYSTEM;  (1)  A 
cooperative  distributed  processing  system  is 
defined  as  a  collection  of  interconnected  process¬ 
ing  elements  with  decentralized  control  that  per¬ 
mits  cooperation  among  processors  for  the  exe¬ 
cution  of  a  single  task.  (2)  Distributed  systems  are 
an  appropriate  response  to  distributed  functions 
to  be  performed.  The  functions  may  be  distri¬ 
buted  geographically,  operationally;  or  manageri- 
ally.  The  important  characteristic  is  that  they  be 
functionally  independent  of  one  another  and  have 
weak,  well-defined  data  flow  oriented  interac¬ 
tions.  (3)  A  cooperative  arrangement  of  intercon¬ 
nected  computers  whose  quasi-autonomous 
operations  are  coordinated  by  a  reassignable  exe¬ 
cutive  program.  [DACS79] 

DOMAIN  ERROR:  Is  an  incorrect  path  domain 
that  occurs  due  to  path  selection  or  missing  path 
faults. 

DOMAIN  TESTING;  A  testing  technique  that 
generates  test  data  to  detect  domain  errors  in  a 
program.  Detection  of  domain  errors  is 
guaranteed  within  a  quantifiable  error  bound. 

DYNAMIC  ALLOCATION:  The  allocation  of 
addressable  storage  and  other  resources  to  a  pro¬ 
gram  while  the  program  is  executing.  [IEER83] 

DYNAMIC  ANALYSIS:  The  process  of  evaluat¬ 
ing  a  program  based  on  execution  of  the  program. 
[IEEE83] 

DYNAMIC  ASSERTION:  A  technique  which 
inserts  assertions  about  the  relationship  between 
program  variables  into  the  program  code.  The 
truth  of  the  assertions  is  determined  as  the  pro¬ 
gram  executes.  [Adri82] 

DYNAMIC  BINDING:  Binding  performed  during 
execution  of  a  program.  Contrast  with  STATIC 
BINDING.  [IEEE83] 

DYNAMICALLY  RECONFIGURING:  See 
DYNAMIC  RESTR  UCTURING. 

DYNAMIC  RESTRUCTURING:  The  process  of 


changing  software  components  or  structure  while 
a  system  is  running.  [IEEE83] 


EDGE:  In  a  digraph,  the  oriented  connection 
between  two  nodes.  Also  called  an  arc.  [MillSl] 

EFFICIENCY:  The  extent  to  which  the  software 
performs  its  intended  functions  with  a  minimum 
consumption  of  computing  resources.  [IEEES3] 

EFFICIENT  ESTIMATOR:  If  two  different  esti¬ 
mators  have  the  same  expectation,  then  the  one 
with  the  smaller  variance  is  said  to  be  more 
efficient. 

ELEMENTARY  COMPUTATIONAL  STRUCT- 
URES:  In  a  program,  those  objects  such  as  refer¬ 
ences  to  variables,  arithmetic  expressions  and 
relations,  and  Boolean  expressions  that  may 
appear  independently  or  as  part  of  a  more  com¬ 
plex  component. 

EMULATOR:  Hardware,  software,  or  firmware 
that  supports  the  imitation  of  all  or  part  of  one 
computer  system  by  another.  [IEEE83] 

ENTRY  NODE:  In  a  program  digraph,  a  node 
which  has  more  than  one  outway  and  zero  inways. 
An  entry  node  has  an  in-degree  of  zero  and  a 
non-zero  out-degree.  [MillSl] 

ENVIRONMENT:  The  combination  of  all  exter¬ 
nal  or  extrinsic  conditions  that  affect  the  opera¬ 
tion  of  an  entity.  [DACS79] 

ENVIRONMENT  SIMULATOR:  An  automated 
replication  of  the  external  world  constructed  for 
testing. 

EQUATE:  An  automated  testing  system  which 
merges  weak  mutation  testing  and  perturbation 
testing  to  find  faults  in  the  execution  of  paths  in 
an  Ada  program. 

EQUIVALENCE  PARTITIONING:  A  test  data 
selection  technique  based  on  considerations  of 
(1)  partitioning  the  input  domain  of  a  program 
into  a  finite  number  of  equivalence  classes  such 
that  a  test  of  a  representative  value  of  each  class 
is  equivalent  to  a  test  of  any  other  value  and  (2) 
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each  test  case  should  invoke  as  many  different 
input  conditions  as  possible  in  order  to  minimize 
the  total  numbor  of  test  cases  necessary. 

ERROR:  (1)  A  discrepancy  between  a  computed, 
observed,  or  measured  value  or  condition  and  the 
true,  specified,  or  theoretically  correct  value  or 
condition.  [IEEE83]  (2)  A  mental  mistake  made 
by  a  programmer  which  may  result  in  a  program 
fault. 

ERROR  CHECKLIST:  A  list  of  errors  that  must 
be  looked  for  during  an  inspection.  The  list  is 
compiled  from  errors  that  have  frequently  been 
found  during  prior  inspections. 

ERROR  CORRECTION  MODEL:  A  model  to 
estimate  the  mean  correction  time. 

ERROR  CORRECTION  RATE:  Number  of 
errors  corrected  per  unit  of  time. 

ERROR  COUPLING  EFFECT:  The  assumption, 
used  in  mutation  testing,  that  test  data  on  which 
all  simple  mutants  fail  is  so  sensitive  that  it  is 
highly  likely  that  all  complex  mutants  must  also 
fail. 

ERROR  GUESSING:  A  test  data  selection  tech¬ 
nique.  The  selection  criteria  is  to  pick  values  that 
seem  likely  to  cause  [failures].  [Adri82] 

ERROR  OPERATOR:  A  transformation  applied 
to  a  program  to  produce  a  mutation  of  the  pro¬ 
gram  that  contains  a  specific  type  of  fault.  Test 
data  that  can  distinguish  between  the  original  and 
mutated  programs  is  said  to  be  adequate  for 
detectii^  that  fault. 

ERROR  QUEUE  LENGTH:  Number  of  errors 
detected  waiting  to  be  processed  by  the  fault- 
correction  personnel. 

ERROR  SEEDING:  The  process  of  intentionally 
adding  a  known  number  of  faults  to  those  already 
in  a  program  for  the  purpose  of  estimating  the 
number  of  indigenous  faults  in  the  program. 
[IEEE83] 

ERROR  SENSITIVE  TEST  CASE  ANALYSIS 
(ESTCA):  A  testing  technique  that  provides  rules 


for  generating  test  data  sensitive  to  commonly 
occurring  faults. 

ERROR-BASED  TESTING:  Testing  where  infor¬ 
mation  about  programming  style,  error-prone 
language  constructs,  and  other  programming 
knowledge,  is  applied  to  select  test  data  capable 
of  detecting  faults,  either  a  specified  class  of 
faults  or  all  possible  faults. 

EVALUATION:  The  process  of  examining  a  sys¬ 
tem  or  system  component  to  determine  the  extent 
to  which  specified  properties  are  present. 

EVOLUTIONARY  DEVELOPMENT  AND 
DEPLOYMENT:  A  paradigm  for  constructing 
computer  systems  where  the  system  is  developed 
and  deployed  in  a  series  of  versions  with  increas¬ 
ing  functionality. 

EXCEPTION:  An  event  that  causes  suspension 
of  normal  program  execution.  [IEEE83] 

EXCEPTION  HANDLING:  A  set  of  program¬ 
ming  techniques  for  recognizing  and  acting  upon 
exceptions. 

EXECUTABLE  SPECIFICATION:  A 

specification  which  is  given  in  a  sufficiently  for¬ 
mal  notation  to  allow  its  execution  by  a  computer. 

EXECUTABLE  STATEMENT:  A  statement  in  a 
module  which  is  executable  in  the  sense  that  it 
produces  object  code  instructions.  [MillSl] 

EXECUTION:  The  process  of  carrying  out  an 
instruction  or  the  instructions  of  a  computer  pro¬ 
gram  by  a  computer.  [IEEES3] 

EXECUTION  ENVIRONMENT:  See  ENVIRON¬ 
MENT, 

EXECUTION  PATH:  See  Path, 

EXECUTION  TIME:  (1)  The  amount  of  actual  or 
central  processor  time  used  in  executing  a  pro¬ 
gram.  (2)  The  period  of  time  during  which  a  pro¬ 
gram  is  executing.  See  also  RUN  TIME.  [IEEE83] 

EXHAUSTIVE  TESTING:  Executing  the  pro¬ 
gram  with  all  possible  combinations  of  values  for 
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program  variables.  [Adri82] 

EXIT  NODE:  In  a  digraph,  a  node  which  has 
more  than  one  inway,  but  has  zero  outways.  An 
exit  node  has  zero  out-degree,  and  a  non-zero  in¬ 
degree.  [MSI181] 

EXPECTED  VALUE:  Mean  of  a  random  vari¬ 
able.  [MusaS?] 

EXPRESSION  ANALYSIS:  A  form  of  static 
error  analysis  that  detects  certain  commonly 
occurring  faults  associated  with  the  evaluation  of 
expressions;  for  example,  incorrect  or  incom¬ 
plete  parentheses. 

EXPRESSION  SET:  In  the  EQUATE  system,  the 
set  of  all  expressions  and  subexpressions  from 
the  abstract  syntax  tree  of  the  module  under  test. 

EXTENT:  The  extent  of  a  fault-based  testing 
technique  reflects  the  scope  of  information  used 
to  determine  the  absence  of  a  predefined  set  of 
possible  faults.  It  may  be  local  or  global. 

EXTREMAL  TEST  DATA:  Test  data  that  is  at  the 
extreme  or  boundary  ot  the  domain  of  an  input 
variable  or  which  produces  results  at  the  boun¬ 
dary  of  an  output  domain.  [Adri82] 

FACTOR  SAMPLE:  A  set  of  factor  values  which 
is  drawn  from  the  metrics  data  base  and  used  in 
metrics  validation.  [IEEE88] 

FACTOR  VALUE:  A  value  (see  metric  value  of 
the  direct  metric  that  represents  a  factor. 
[IEEE88] 

FAIL-SAFE:  A  fail-safe  system  limits  the  amount 
of  damage  caused  by  a  failure  and  may  not  strive 
to  continue  functionality. 

FAIL-SOFT:  A  fail-soft  system  continues  opera¬ 
tion  but  provides  only  degraded  performance  or 
reduced  functional  capabilities  until  the  fault  is 
removed. 

FAILURE:  The  inability  of  a  system  or  system 
component  to  perform  a  required  function  within 
specified  limits.  A  failure  may  be  produced  when 


a  fault  is  encountered.  [IEEE83] 

FAILURE  COUNT  MODEL:  Software  reliabiUty 
model  in  which  the  failure  process  is  represented 
as  a  stochastic  process  with  a  time  dependent 
failure  rate.  [Goel85]. 

FAILURE  DETECTION  RATE:  Number  of 
failures  detected  per  unit  of  time. 

FAILURE  HISTORY:  In  software  reliability 
modeling,  the  record  of  software  failures,  in 
terms  of  failure  times  or  failure  counts  per  inter¬ 
val. 

FAILURE  INTENSITY:  Failures  per  unit  of  time, 
the  derivative  with  respect  to  time  of  the  mean 
value  function  of  failures.  [Musa87] 

FAILURE  INTENSITY  DECAY  PARAMETER: 
In  the  logarithmic  Poisson  execution  time  model, 
the  parameter  that  represents  the  rate  of 
exponential  decay  of  the  failure  intensity  as  a 
function  of  mean  failures  experienced.  [Musa87]. 

FAILURE  INTERVAL:  Time  between  failures. 
[Musa87] 

FAILURE  PROBABILITY:  The  probability  of 
failure  under  specified  conditions. 

FAILURE  RATE:  The  ratio  of  the  number  of 
failures  to  a  given  unit  of  measure;  for  example, 
failures  per  unit  of  time,  failures  per  number  of 
transactions,  failures  per  number  of  computer 
runs.  [IEEE83] 

FAILURE  SEVERITY:  Classification  of  a  failure 
by  its  operational  impact.  [Musa87] 

FALSE  ASSUMPTION  DECOMPOSITION 
ERROR:  Errors  resulting  from  incorrect  assump¬ 
tions  about  the  meaning  or  usage  of  data. 

FAULT:  A  manifestation  of  an  error  in  software. 
A  fault,  if  encountered,  may  cause  a  failure. 
[IEEE83] 

FAULT  CORRECTION  RATE:  Number  of 
failures  corrected  by  the  failure-correction  per¬ 
sonnel  per  unit  of  time. 
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FAULT  CORRECTION  PROFILES:  The 
profiles  (in  terms  of  number  of  people,  and  fault 
correction  rate)  of  the  failure  correction- 
personnel  assumed  for  a  particular  model. 

FAULT  DENSITY:  ProbabUity  density  of  the 
ftdlures. 

FAULT  REDUCTION  FACTOR:  Net  reduction 
in  faults  per  failure  experienced.  [Musa87] 

FAULT  SEEDING  MODEL:  Software  reliability 
model  in  which  the  number  of  indigenous  faults 
in  a  program  is  estimated  from  the  number  of 
seeded  faults  and  iudigenous  faults  that  are 
detected.  [GoelSSJ. 

FAULT-BASED  TESTING:  Testing  which 
employs  a  test  data  selection  strategy  designed  to 
generate  test  data  capable  of  demonstrating  the 
absence  of  a  prespecified  set  of  faults;  typically 
frequently  occurring  faults. 

FAULT-TOLERANCE:  The  probability  that  a 
system  detects,  recovers,  and  insulates  itself  from 
the  effects  of  specified  component  faults  or 
failures  in  order  to  maintain  a  high  degree  of  avai¬ 
lability  when  operated  under  stated  conditions  for 
a  specified  period  of  time.  [DeMiSS] 

FAULT-TOLERANT  SOFTWARE:  A  software 
structure  employing  functionally  redundant  rou¬ 
tines  with  concurrent  error  detection,  and  provi¬ 
sions  to  switch  from  one  routine  to  a  functional 
alternate  in  the  event  of  a  detected  fault. 
[DACS79] 

FAULT-TOLERANCE  TECHNIQUES:  Program¬ 
ming  techniques  which  increase  the  fault- 
tolerance  of  a  system  or  system  component;  for 
example,  N-version  programming,  recovery 
blocks. 

FAULT  TREE:  The  tree  built  during  (software) 
fault  tree  analysis  which  is  developed  using  back¬ 
ward  reasoning  to  the  identify  the  causes  and  con¬ 
ditions  which  may  lead  to  a  critical  failure. 

FAULT  TREE  ANALYSIS:  A  form  of  safety 
analysis  that  assesses  hardware  safety  to  provide 
failure  statistics  and  sensitivity  andyses  which 


indicate  the  possible  effect  of  critical  failures. 

FIDELITY:  Hdelity  is  defined  as  the  accuracy 
with  which  a  given  algorithm  is  mechanized  for  a 
given  operating  system  and  hardware  system. 
[DACS79] 

FILE  COMPARATOR:  A  software  tool  which 
compares  two  files  to  identify  discrepancies 
between  them. 

FfNITE  INPUT  SEQUENCES:  A  finite  set  of 
symbols  which  forms  a  string  of  characters  used 
as  the  input  for  some  process. 

FINITE  STATE  MACHINE:  A  computational 
model  consisting  of  a  finite  number  of  states,  and 
transitions  between  these  states.  [IEEE83] 

FIRM  MUTATION  TESTING:  A  version  of 
mutation  testing  which  merges  the  strengths  of 
strong  and  weak  mutation  testing  by  using  com¬ 
ponents  with  more  extensive  scope  than  weak 
mutation  testing  and  allowing  several  mutants  to 
be  applied  in  a  single  program  execution. 

FIRST-ORDER  LOGIC:  See  PREDICATE  CAL¬ 
CULUS. 

FIRST-ORDER  PREDICATE  CALCULUS:  See 
PREDICATE  CALCULUS. 

FLAVOR  ANALYSIS:  A  form  of  analysis  used  in 
testing  large  scale  software  systems  to  detect 
incorrect  assumptions  about  the  meaning  and 
usage  of  data. 

FLEXIBILITY:  The  effort  to  extend  the  software 
mission,  functions,  or  data  to  satisfy  other 
requirements.  [RADC83] 

FLOWCHART:  A  graphical  representation  of 
the  definition,  analysis,  or  solution  to  a  problem 
in  which  symbols  are  used  to  represent  opera¬ 
tions,  data,  flow,  and  equipment.  [IEEE83] 

FOLLOW-ON  OPERATIONAL  TEST  AND 
EVALUATION  (FOT&E):  Operational  test  and 
evaluation  conducted  after  deployment  of  a  sys¬ 
tem.  For  example,  to  validate  assumptions  made 
in  previous  operational  testing. 
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FORMAL  DEVELOPMENT  METHODOLOGY 
(FDM):  An  automated  verification  system, 
developed  by  the  System  Development  Corpora¬ 
tion  and  employing  the  Ina  Jo  language. 

FORMAL  LANGUAGE:  A  language  whose  rules 
are  explicitly  established  prior  to  its  use. 
Synonymous  with  artificial  l^guage.  Examples 
include  programming  languages,  such  as  FOR¬ 
TRAN  and  Ada,  and  mathematical  or  lo^cal 
languages,  such  as  predicate  calculus.  [IEEE83] 

FORMAL  SEMANTICS:  The  mathematical 
definition  of  the  semantics  of  a  language. 

FORMAL  SOFTWARE  DEVELOPMENT:  The 
use  of  formal  methods  to  specify,  verify,  and  test 
software. 

FORMAL  SPECmCATION;  In  proof  of 
correctness,  a  description  in  a  formal  language  of 
the  externally  visible  behavior  of  a  system  or  sys¬ 
tem  component.  Generally,  a  specification  writ¬ 
ten  and  approved  in  accordance  with  established 
standards.  [IEEE83] 

FORMAL  VERIFICATION;  See  VERIFICA¬ 
TION. 

FUNCTION:  (1)  A  specific  purpose  of  an  entity 
or  its  characteristic  action.  (2)  A  subprogram 
that  is  invoked  during  the  evaluation  of  an  expres¬ 
sion  in  which  its  name  appears  and  that  returns  a 
value  to  the  point  of  invocation.  [IEEE83] 

FUNCTIONAL  ABSTRACTION:  A  design  stra¬ 
tegy  in  which  programs  are  viewed  as  a  hierarchy 
of  abstract  functions. 

FUNCTIONAL  REQUIREMENT:  A  requirement 
that  specifies  a  function  that  a  system  or  system 
component  must  be  capable  of  performing. 
[IEEE83] 

FUNCTIONAL  SPECIFICATION:  A  set  of 
behavioral  and  performance  requirements  which, 
in  aggregate,  determine  the  functional  properties 
of  a  software  system.  [MillSl] 

FUNCTIONAL  TESTING:  Application  of  test 
data  derived  from  the  specified  functional 


requirements  without  regard  to  the  final  program 
structure.  [Adri82] 

GENERATOR:  In  the  EQUATE  system,  the 
expression  set  term  whose  subexpression  was 
modified  in  the  derivation  of  operand  substitution 
terms. 

GENERIC  COMPONENT:  A  generic  component 
is  one  which  can  be  instantiated  in  a  number  of 
predefined  ways  so  that  each  occurrence  of  the 
component  can  be  tailored  to  suit  a  particular 
usage.  For  example,  a  generic  component  which 
provides  a  set  of  queue  handling  routines  might 
be  designed  so  that  it  can  be  instantiated  to 
operate  on  queues  with  different  message  for¬ 
mats. 

GEOMETRIC  MODEL:  A  reliability  model  pro¬ 
posed  by  Moranda  as  a  variation  of  the  De- 
Eutrophication  model. 

GEOMETRIC  POISSON  MODEL:  A  reliability 
model  proposed  by  Moranda  as  an  alternative  to 
the  Geometric  model. 

GLOBAL  ASSERTION:  Those  assertions  which 
are  valid  for  the  whole  program  being  validated. 

GLOBAL  INVARIANT:  Those  assertions  which 
do  not  change  for  the  whole  program. 

GOAL/QUESTION/METRIC  PARADIGM:  A 
measurement  approach  which  aids  in  determining 
and  specifying  the  goals  of  a  software  develop¬ 
ment  project. 

GRAMMAR-BASED  TESTING:  A  testing 
method  that  generates  test  cases  from  a  formal 
specification  of  a  system  or  system  component. 

GRAPH:  A  model  consisting  of  a  finite  set  of 
nodes  having  connections  called  edges  or  arcs. 
[IEEE83] 

GYPSY  SPECIFICATION  LANGUAGE:  A 
langu^e  consisting  of  two  intersecting  com¬ 
ponents:  a  formal  specification  language  and  a 
verifiable  high-level  programming  language. 
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GYPSY  VERinCATION  ENVIRONMENT 
(GVE):  An  automated  verification  system 
developed  at  the  University  of  Texas  at  Austin 
and  employing  the  Gypsy  language. 


HARDWARE:  Physical  equipment  used  in  data 
processing,  as  opposed  to  computer  programs, 
procedures,  rules,  and  associated  documenta¬ 
tion.  Contrast  with  SOFTT^RE.  [IEF.E83] 

HARDWARE-IN-THE-LOOP  SIMULATION:  A 
simulation  that  includes  one  or  more  physical  ele¬ 
ments  of  the  system  being  simulated,  interacting 
with  the  software  models  of  the  remaining  system 
elements. 

EhVZARD:  A  set  of  conditions  (state)  that  has  an 
unacceptable  risk  of  leading  to  an  accident,  given 
certain  environmental  conditions. 

HAZARD  FUNCTION:  (1)  The  probability  that 
an  error  occurring  in  a  ^ven  infinitesimal  time 
interval  given  that  no  error  has  occurred  previ¬ 
ously  to  that  interval.  (2)  Instantaneous  failure 
rate  of  a  system.  (Musa’s  model.)  (3)  The  error- 
rate  relationship.  [DACS79] 

HAZARD  RATE:  Probability  density  (per  unit  of 
time)  of  failure  given  that  failure  has  not  occurred 
up  to  the  present.  [Musa87] 

HEURISTIC:  An  exploratory  method  of  problem 
solving  in  which  solutions  are  discovered  by 
evaluation  of  the  progress  made  toward  the  final 
result.  Contrast  with  ALGO/?/THAf.  [DACS79] 

HIERARCHICAL  DEVELOPMENT  METHO¬ 
DOLOGY  (HDM):  An  automated  verification 
system  developed  by  SRI  International  and 
employing  the  SPECIAL  state-machine  language. 

HO  ARE  LOGIC:  A  logic  in  which  the  behavior 
of  a  statement  5  is  specified  by  an  assertion  P 
describing  possible  states  before  execution  of  S, 
and  a  second  assertion  Q  describing  possible 
states  after  the  execution  of  S. 

HOMOGENEOUS;  Processing  characteristics 
that  do  not  vary  with  time. 


HOST  MACHINE:  A  computer  used  to  develop 
software  intended  for  another  computer. 
{IEEE83] 

IMPLEMENTATION:  (1)  The  implementation  of 
a  program  is  either  a  machine  executable  form  of 
the  program,  or  a  form  of  the  program  that  can 
be  automatically  translated  (e.g.,  by  compiler  or 
assembler).  (2)  That  process  by  wUch  an  archi¬ 
tectural  design  is  turned  into  a  delivered  program. 
It  includes  the  detailed  functional  and  procedural 
design,  coding,  testing,  and  documentation 
necessary  to  meet  program  requirements,  either 
for  new  or  modified  software.  [DACS79] 

IMPROVEMENT  PARADIGM:  A  paradigm 
which  guide  activities  necessary  to  better  under¬ 
stand  and  learn  from  the  software  construction 
process. 

INA  JO;  A  non-procedural  specification  language 
based  on  an  extension  of  first-order  predicate  cal¬ 
culus,  used  in  the  Formal  Development  Metho¬ 
dology  automated  verification  system. 

INCREMENTAL  ANALYSIS:  Occurs  when  (par¬ 
tial)  analysis  may  be  performed  on  an  incomplete 
product  to  allow  early  feedback  on  the  develop¬ 
ment  of  that  product. 

INCREMENTAL  TESTING:  See  INCREMEN¬ 
TAL  ANALYSIS. 

INDEPENDENT  VALIDATION  AND  VERIFl- 
CATION  (IV&V):  Verification  and  validation  of  a 
software  product  by  an  organization  that  is  both 
technically  and  manageridly  separate  from  the 
organization  responsible  for  developing  the  pro¬ 
duct.  [IEEE83] 

INDETERMINISM:  Inverse  of  DETERMINISM. 

INDUCTION:  The  use  of  a  mathematical  tech¬ 
nique  that  employs  reasoning  from  a  part  to  a 
whole. 

INDUCTIVE  ASSERTION  METHOD:  A  proof 
of  correctness  technique  in  which  assertions  are 
written  describing  program  inputs,  outputs  and 
intermediate  conditions,  a  set  of  theorems  is 
developed  relating  satisfaction  of  the  input 
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assertions  to  sadsf^tion  of  the  output  assertions, 
and  the  theorems  are  proved  to  be  true.  [IEEES3] 

INEVITABILITY :  A  program  will  inevitably  est¬ 
ablish  predicate  R  in  the  computation  started  in 
state  5  if  and  only  if  for  every  sequence  t  in  the 
computational  history  ot  S  there  is  an  initial  sec¬ 
tion  roit  such  that  R  holds  in  the  last  state  of  r. 

INFEASIBLE  PATH:  A  sequence  of  program 
statements  that  can  never  be  executed.  [Adri82] 

INFERENCE  RULES:  The  basic  building  blocks 
of  formal  proof.  They  generally  consist  of  a 
number  of  hypotheses  and  a  conclusion,  the  idea 
being  that  the  validity  of  the  conclusion  can  be 
inferred  from  the  vali^ty  of  all  the  hypotheses. 

INFERENCE  SYSTEM:  A  set  of  inference  rules. 

INFORMATION  HIDING:  The  technique  of 
encapsulating  software  design  decisions  in 
modules  in  such  a  way  that  the  module’s  inter¬ 
faces  reveal  as  little  as  possible  about  the 
module’s  iimer  workings;  thus,  each  module  is  a 
“black  box’’  to  the  other  modules  in  the  system. 
The  discipline  of  information  hiding  forbids  the 
use  of  information  about  a  module  that  is  not  in 
the  module’s  interface  specification.  [lEEEfiS] 

INITIAL  OPERATIONAL  TEST  AND  EVALUA¬ 
TION:  The  first  phase  of  operational  test  and 
evaluation  conducted  on  preproduction  items, 
prototypes,  or  piloi  production  items  and  nor¬ 
mally  completed  prior  to  the  first  major  produc¬ 
tion  decision.  It  is  conducted  to  provide  a  valid 
estimate  of  expected  system  operational 
effectiveness  and  suitability. 

INITIAL  VALUE  SET:  In  the  EQUATE  system, 
the  set  of  values  first  taken  on  by  each  expression 
set  term  at  each  test  location  during  testing. 

INPUT:  See  PROGRAM  INPUT. 

INPUT  ASSERTION:  A  logical  expression  speci¬ 
fying  one  or  more  conditions  that  program  inputs 
must  satisfy  in  order  to  be  valid.  [IEEE83] 

INPUT  CLAUSE:  See  INPUT  ASSERTION. 


INPUT  CONDITION:  See  INPUT  ASSERTION. 

INPUT  DOMAIN:  See  INPUT  SPACE . 

INPUT  DOMAIN  BASED  MODEL:  Software 
reliability  model  in  which  reliability  is  estimated 
from  the  fraction  of  test  runs  resulting  in  failure. 
Failures  are  weighted  according  to  the  opera¬ 
tional  profile  of  the  software.  [GoelSS] 

INPUT  SPACE:  Consists  of  that  subset  of  a 
module’s  communication  space  which  can  be  (1) 
altered  externally  to  the  module,  and  (2)  which  is 
(potentially)  used  within  the  module  in  such  a  way 
that  affects  its  execution.  [MillSl] 

INPUT-SPACE  PARTITIONING  TESTING 
TECHNIQUES:  Testing  techniques  which  employ 
a  test  data  generation  strategy  based  on  partition¬ 
ing  the  input  space  of  a  program. 

INSPECTION:  A  formal  evaluation  technique  in 
which  software  requirements,  design,  or  code  are 
examined  in  detail  by  a  person  or  group  other 
than  the  author  to  detect  faults,  violations  of 
development  standards,  and  other  problems. 
[IEEE83] 

INSTRUMENTATION:  See  PROGRAM 

INSTRUMENTATION. 

INTEGRATION:  The  process  of  combining 
software  elements,  hardware  elements,  or  both 
into  an  overall  system.  [IEEE83] 

INTEGRATION  TESTING:  An  orderly  progres¬ 
sion  of  testing  in  which  software  elements, 
hardware  elements,  or  both  are  combined  and 
tested  until  the  entire  system  has  been  integrated. 
[IEEE83] 

INTEGRITY:  (1)  The  probability  that  stored 
information  and  data  not  be  modified  by 
unauthorized  means.  [DeMiSS]  (2)  The  extent  to 
which  unauthorized  access  to  the  software  or  data 
can  be  controlled.  [RADC83]. 

INTERFACE:  A  shared  boundary.  An  interface 
might  be  a  hardware  component  to  link  two  dev¬ 
ices  or  it  might  be  a  portion  of  storage  or  registers 
accessed  by  two  or  more  computer  programs. 
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[IF.F.E83] 

INTERFACE  ANALYSIS:  Checks  the  interfaces 
between  program  elements  for  consistency  and 
adherence  to  predefined  rules  or  axioms. 

INTERFACE  CONTROL:  Interface  control 
requires  that  input/output  specifications  must  be 
controlled  as  ei^ineering  configuration  items  at 
system  design,  implementation,  integration,  and 
operation  times.  [DACS79] 

INTERFERENCE-FREE:  See  NON-INTER¬ 
FERENCE. 

INTERMITTENT  ASSERTION  METHOD:  A 
formal  verification  method  that  proves  properties 
of  programs  by  induction  on  their  space.  The 
method  only  applies  to  while  statements. 

INTEROPERABILITY:  The  effort  to  couple  the 
software  of  one  system  to  the  software  of  another 
system.  [RADC^]. 

INTERPRETER:  (1)  Software,  hardware,  or 
firmware  used  to  interpret  computer  programs. 
Contrast  with  COMPILER.  [IEEE83] 

INTERPROCESS  COMMUNICATION:  The 
sending  and  receiving  of  messages  by  the 
processes/entities  within  an  operating  system. 
[DACS79] 

INVARIANCE:  A  predicate  R  is  invariant  in  the 
state  5  if  and  only  if  is  true  in  every  state  of 
every  sequence  in  the  computational  history  of  S 
unless  the  state  is  blocked  or  empty. 

INVARIANT  ASSERTION  METHOD:  A  proof 
method  in  which  one  deduces  the  correctness  of 
comply  statements  from  the  correctness  of  their 
components. 

INVOCATION:  (1)  The  transfer  of  control  to  an 
entity  causing  it  to  be  activated.  (2)  The  linking  to 
or  insertion  of  a  procedure  body  by  means  of  a 
named  reference  within  a  procedure.  Subroutine 
linking  is  sometimes  referred  to  as  a  “call.”  Code 
insertion  is  referred  to  as  a  “macro  call.” 
[DACS79]. 


LEMMA:  An  intermediate  conclusion  in  the 
development  of  the  proof  of  a  theorem. 

LIFECYCLE:  Stc  SOFTWARE  LIFECYCLE. 

LINEAR  CODE  SEQUENCE  AND  JUMP  PRO¬ 
GRAM  UNITS:  Sections  of  the  code  through 
which  the  flow  of  control  proceeds  sequentially 
until  terminated  by  a  jump  in  the  control  flow. 

LIVENESS:  A  program  property  that  states  that 
a  desired  state,  such  as  termination,  can  be 
reached. 

LOGARITHMIC  POISSON  EXECUTION  TIME 
MODEL:  Software  reliability  model  in  which  the 
failure  process  is  assumed  to  be  a  nonhomogene- 
ous  Poisson  process  with  exponentially  decreas¬ 
ing  failure  intensity.  [Musa87]. 

LOGICAL  ASSERTIONS:  Logical  postulates 
usually  used  to  characterize  legitimate  program 
input  and  output  states  and  hence  the  effect 
(semantics)  of  the  program. 

LOOP:  A  set  of  instructions  that  may  be  exe¬ 
cuted  repeatedly  while  a  certain  condition  pre¬ 
vails.  [IEEE83] 

LOSS  EVENT:  In  fault  tree  analysis,  the  critical 
failure  which  is  assumed  to  have  occurred  and 
which  forms  the  root  of  the  fault  tree. 


MAINTAINABILITY:  (1)  The  probabUity  that 
specified  unavailable  functions  can  be  repaired  or 
restored  to  their  operational  state  in  the  system’s 
intended  maintenance  environment  during  a 
specified  period  of  time.  [DeMiSS]  (2)  The  aver¬ 
age  effort  to  locate  and  fix  a  software  failure. 
[RADC83]. 

MANUAL  REVIEWS:  See  INSPECTION  and 
WALKTHROUGH. 

MATHEMATICAL  INDUCTION:  See  INDUCT¬ 
ION. 

MEAN  TIME  BETWEEN  FAILURES  (MTBF): 
The  sum  of  mean  time  to  failure  and  mean  time  to 
repair. 
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MEAN  TIME  TO  FAILURE  (MTTF):  (1)  A 
measure  of  the  time  elapsed  before  a  failure 
occurs  where  units  of  time  may  reflect  either  exe¬ 
cution  time  or  calendar  time.  Used  as  an  indica¬ 
tion  of  software  reliability.  (2)  Expected  value  of 
the  failure  interval.  [MusaS?] 

MEAN  TIME  TO  REPAIR  (MTTR):  Expected 
value  of  the  time  required  to  restore  to  normal 
operation.  [Musa87] 

MEASURE:  To  ascertain  or  appraise  by  compar¬ 
ing  to  a  standard;  to  apply  a  metric.  [IEEE88] 

MEASUREMENT:  1)  the  act  or  process  of 
measuring;  2)  a  figure,  extent,  or  amount 
obtained  by  measuring.  [IEEE88] 

METRIC:  A  measure  of  the  extent  or  degree  to 
which  a  product  possesses  and  exhibits  a  certain 
quality,  property,  or  attribute.  [IEEE83] 

METRICS  DATA  BASE:  An  organized  collec¬ 
tion  of  factor  values  and  corresponding  metric 
values.  (1EEE88] 

METRICS  FRAMEWORK:  A  tool  used  for 
organizing,  selecting,  communicating  and 
evaluating  the  required  quality  attributes  for  a 
software  system;  a  hierarchical  breakdown  of  fac¬ 
tors,  sub-factors,  and  metrics  for  a  software  sys¬ 
tem.  [IEEE88] 

METRICS  METHODOLOGY:  A  systematic 
approach  to  establishing  quality  requirements  and 
identifying,  implementing,  andyzing  and  validat¬ 
ing  software  quality  metrics  for  a  software  sys¬ 
tem.  [IEEE88] 

METRICS  PLAN:  A  document  that  contains  a 
complete  software  quality  metrics  framework  for 
a  system,  the  set  of  documented  metrics,  and  the 
set  of  documented  data  items.  [IEEE88] 

METRICS  SAMPLE:  A  set  of  metrics  values 
which  is  drawn  from  the  metrics  data  base  and 
used  in  metrics  validation.  [IEEE88] 

METRICS  VALIDATION:  The  act  or  process  of 
ensuring  that  a  metric  correctly  predicts  a  quality 
factor.  [IEEE88] 


METRIC  VALUE:  An  element  from  the  range  of 
a  metric;  a  metric  output.  [IEEE88] 

MISSING  PATH  FAULT:  Occurs  when  a  special 
case  requires  a  unique  sequence  of  actions,  but 
the  program  does  not  contain  a  correspondii^ 
path.  This  type  of  fault  is  cause  by  missing  condi¬ 
tional  statements. 

MODE:  A  way  of  operating  a  program  to  perform 
a  certain  subset  of  the  functions  that  the  entire 
program  can  perform,  as  selected  by  control  data 
or  operating  conditions.  Often,  the  mode  of  a 
program  will  be  defined  as  program  states,  with 
transitions  annotated  to  delineate  events  causing 
the  passages  between  modes  of  operation. 
[DACS79] 

MODULARITY :  Those  attributes  of  the  software 
which  provide  a  structure  of  highly  cohesive 
modules  with  optimum  coupling.  [RADC83] 

MODULE;  A  module  is  a  separately  invocable 
element  of  a  software  system.  [MillSl] 

MODULE-INTERFACE  ANALYSIS:  A  form  of 
interface  analysis  which  examines  the  interfaces 
between  system  components  for  consistency, 
completeness,  and  redundancy. 

MOTHRA:  An  automated  testing  system  which 
applies  mutation  testing,  structural  testing,  and  a 
form  of  functional  testing  to  FORTRAN  pro¬ 
grams. 

MULTI-LEVEL  SECURITY:  A  mode  of  opera¬ 
tion  permitting  data  at  various  security  levels  to 
be  concurrently  stored  and  processed  in  a  com¬ 
puter  system,  when  at  least  some  users  have  nei¬ 
ther  the  clearance  nor  the  need-to-know  for  all 
data  contained  in  the  system.  [IEEE83] 

MULTI-TASKING:  A  method  of  describing  con¬ 
current  programs  as  collections  of  separate  tasks. 

MULTI-UNIT  TEST:  Consists  of  a  unit  test  of  a 
single  module  in  the  presence  of  other  modules. 
It  includes  (1)  a  collection  of  settings  for  the  input 
space  of  the  module  and  all  the  other  modules 
invoked  by  it,  but  (2)  precisely  one  invocation  of 
the  module  under  test.  [MillSl j 
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MUTANT:  A  mutated  form  of  a  program  pro¬ 
duced  by  applying  an  error  operator  which  intro¬ 
duces  a  pr^efined  fault  into  a  program  state¬ 
ment. 

MUTATION  ANALYSIS:  See  MUTATION 
TESTING. 

MUTATION  TESTING:  A  method  to  determine 
test  set  thoroughness  by  measuring  the  extent  to 
which  a  test  set  can  ^scriminate  the  program 
from  slight  variants  of  the  program.  [Adri82] 

MUTATION  TRANSFORMATION:  See  ERROR 
OPERATOR. 

MUTUAL  EXCLUSION:  Mutual  exclusion 
occurs  when  each  process  accessing  shared  data 
excludes  all  others  from  doing  so  simultaneously. 

N-VERSION  PROGRAMMING:  The  indepen¬ 
dent  generation  of  N>2  functionally  equiv^ent 
programs  from  the  same  initial  specification.  The 
N  programs  possess  all  the  necessary  attributes 
for  concturrent  execution,  during  wluch  com¬ 
parison  vectors  are  generated  by  the  program  at 
certain  points.  [DACS79] 

NETWORK  OF  PROCESSES:  A  set  of 
processes  executing  in  parallel  and  communicat¬ 
ing  via  a  communication  channel. 

NODE:  1)  A  number  assigned  to  a  place  within  a 
program  text.  Generally,  nodes  are  assigned  only 
to  executable  statements.  [MillSl]  2)  A  vertex  in  a 
graph. 

NON-EXECUTABLE  PATH:  See  INFEASIBLE 
PATH. 

NON-EXECUTABLE  STATEMENT:  A  declara¬ 
tion  or  directive  within  a  module  which  does  not 
produce  (during  compilation)  object  code 
instructions  directly.  [MillSl] 

NON-INTERFERENCE:  In  verification  of  paral¬ 
lel  programs,  those  assertions  which  will  be  valid 
regardless  of  the  manner  in  which  the  programs 
interact. 


NON-PROCEDURAL  LANGUAGE:  Those 
languages  which  do  not  have  procedure  call  state¬ 
ments  in  their  syntax. 

NONHOMOGENEOUS  POISSON  PROCESS 
MODEL:  A  reliability  model  developed  by  Goel 
and  Okumoto. 

NP-COMPLETE:  A  problem  for  which  all 
known  solutions  do  not  have  a  polynomial-time 
solution. 


OPERAND  SUBSTITUTION  TERMS:  In  the 
EQUATE  system,  the  set  of  expressions  that  can 
be  formed  by  substituting  any  member  of  the 
expression  set  for  any  subexpression  of  another 
expression  set  member. 

OPERATING  MODE:  See  MODE. 

OPERATING  SYSTEM:  Software  that  controls 
the  execution  of  programs.  An  operating  system 
may  provide  services  such  as  resource  allocation 
scheduling,  input/output  control,  and  data 
management.  Although  operating  systems  are 
predominantly  software,  partial  or  complete 
hardware  implementations  are  possible.  An 
operating  system  provides  support  in  a  single  spot 
rather  than  forcing  each  program  to  be  concerned 
with  controlling  hardware.  [IEEE83] 

OPERATIONAL:  The  status  given  a  software 
package  once  it  has  completed  contractor  testing 
and  it  is  turned  over  to  the  eventual  user  for  use 
in  the  applications  environment.  [DACS79] 

OPERATIONAL  ENVIRONMENT:  The  environ¬ 
ment  in  which  a  system  or  system  component  will 
be  deployed  and  operate. 

OPERATIONAL  PROFILE:  The  expected  run 
time  distribution  of  inputs  to  a  program. 

OPERATIONAL  RELIABILITY:  The  reliability 
of  a  system  or  software  subsystem  in  its  actual  use 
environment.  Operational  reliability  may  differ 
considerably  from  reliability  in  the  specified  or 
test  environment.  [IEEE83]. 

OPERATIONAL  SOFTWARE:  See 
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OPERATIONAL. 

OPERATIONAL  REQUIREMENTS:  QuaUtative 
and  quantitative  parameters  which  specify  the 
desired  operational  capabilities  of  a  system  and 
which  will  serve  as  a  basis  for  determining  the 
operational  effectiveness  and  suitability  of  a  sys¬ 
tem  prior  to  deployment. 

OPERATIONAL  TEST  AND  EVALUATION: 
Formal  testing  conducted  prior  to  deployment  to 
evaluate  the  operational  effectiveness  and  suita¬ 
bility  of  a  system  with  respect  to  its  mission. 

OPERATIONAL  TESTING:  Testing  performed 
by  the  end  user  on  software  in  its  normal  operat¬ 
ing  environment.  (DOD  usage)  [IEEE83] 

OPERATOR:  (1)  In  symbol  manipulation,  a  sym¬ 
bol  that  represents  the  action  to  be  performed  in 
an  operation.  Examples  of  operators  are 
(2)  the  description  of  a  process,  that  which 
indicates  the  action  to  be  performed  on 
operands.  [IEEES3] 

OPERATOR-INTERFACE  ANALYSIS:  A  form 
of  interface  analysis  which  examines  the  usage  of 
operators  applied  to  data  structures. 

ORACLE:  A  mechanism  to  produce  the 
“correct”  responses  to  compare  with  the  actual 
responses  of  the  software  under  test.  [Adri82] 

OUTPUT:  See  PROGRAM  OUTPUT. 

OUTPUT  ASSERTION:  A  logical  expression 
specifying  one  or  more  conditions  that  program 
outputs  must  satisfy  in  order  for  the  program  to 
be  correct.  [IEEE83] 

OUTPUT  CLAUSE:  See  OUTPUT  ASSERTION. 

OUTPUT  CONDITION:  See  OUTPUT  ASSER¬ 
TION. 

OUTPUT  SPACE:  Consists  of  the  collection  of 
variables,  including  file  actions,  which  are  (or 
could  be)  modified  by  some  invocation  of  the 
module.  [MillSl] 

OUTSIDE-IN  TESTING:  A  strategy  for 


integration  testing  where  units  handling  program 
inputs  and  outputs  are  tested  first,  and  units 
which  process  the  inputs  to  produce  outputs 
being  incrementally  included  as  the  system  is 
integrated. 


PAGING:  The  technique  of  repeatedly  using  the 
same  areas  of  internal  storage  durii^  different 
stages  of  program  execution.  [IEEE83] 

PARAMETER  ESTIMATION:  The  process  of 
establishing  parameter  values  for  a  model. 

PARAMETER  PREDICTION:  Determination  of 
parameter  values  from  characteristics  of  the 
software  product  and  the  development  process. 
[Musa87] 

PARTIAL  CORRECTNESS:  Conditional  or  par¬ 
tial  correctness  of  a  program  (as  opposed  to  total 
correctness)  is  obtained  when  proving  the 
correctness  of  a  program  is  based  on  the  assump¬ 
tion  that  the  program  terminates. 

PARTITION  ANALYSIS:  A  program  testing  and 
verification  technique  which  employs  symbolic 
evaluation  to  provide  common  representations  of 
a  program’s  specification  and  implementation. 
See  also  PARTITION  ANALYSIS  TESTING, 
PARTITION  ANALYSIS  VERIFICATION. 

PARTITION  ANALYSIS  TESTING:  The  test 
data  selection  process  used  in  partition  analysis 
to  generate  test  data  based  on  analysis  of  both  the 
program  specification  and  implementation. 

PARTITION  ANALYSIS  VERIFICATION:  The 
verification  process  used  in  partition  analysis 
which  attempts  to  determine  the  consistency  pro¬ 
perties  that  hold  between  a  program  specification 
and  its  implementation. 

PATH:  A  sequence  of  segments.  [MillSl] 

PATH  ANALYSIS:  Program  analysis  performed 
to  identify  all  possible  paths  through  a  program, 
to  detect  incomplete  paths,  or  to  discover  por¬ 
tions  of  the  program  that  are  not  on  any  path. 
[IEEE83] 
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PATH  COMPUTATION:  The  function  tiiat  is 
computed  by  the  sequence  of  executable  state' 
ments  along  a  path.  Symbolic  evaluation  gives  a 
path  computation  as  a  vector  of  the  algebraic 
expressions  for  the  output  values. 

PATH  DOMAIN:  Corresponds  to  a  particular 
execution  path  in  a  program  and  consists  of  the 
input  data  points  that  cause  the  path  to  be  exe¬ 
cuted. 

PATH  DOMAIN  BOUNDARY:  The  boundary  of 
a  path  domain  determined  by  the  predicate 
interpretations  in  the  path  condition. 

PATH  EXPRESSION:  A  logical  expression  indi¬ 
cating  the  input  conditions  that  must  be  met  in 
order  for  a  particular  program  path  to  be  exe¬ 
cuted.  [IEEE83] 

PATH  SELECTION  ADEQUACY  CRITERIA: 
Criteria  which  can  be  used  to  assess  the  adequacy 
of  executing  a  given  set  of  program  paths  for 
detecting  a  specified  set  of  potential  faults. 

PATH  SELECTION  CRITERIA:  Criteria  which 
specify  the  set  of  paths  to  be  executed  during  pro¬ 
gram  testing. 

PATH  SELECTION  ERROR:  Occurs  when  a 
program  incorrectly  determines  the  conditions 
under  which  a  path  is  executed.  This  may  be  due 
to  an  incorrect  conditional  statement  or  an 
incorrect  assignment  statement  that  affects  a  con¬ 
ditional  statement. 

PATH  TESTING:  A  test  method  satisfying  cover¬ 
age  criteria  that  each  logical  path  throu^  the  pro¬ 
gram  be  tested.  Often  paths  through  the  program 
are  grouped  into  a  finite  set  of  classes;  one  path 
from  each  class  is  tested.  [Adri82] 

PERFORMANCE:  The  ability  of  a  computer  sys¬ 
tem  or  subsystem  to  perform  its  functions. 
[IEEE83] 

PERFORMANCE  REQUIREMENT:  A  require¬ 
ment  that  specifies  a  performance  characteristic 
that  a  system  or  system  component  must  possess; 
for  example,  speed,  accuracy,  frequency. 
[IEEE83] 


PERTURBING  FUNCTION:  A  term  added  to 
arithmetic  expressions  to  introduce  a  known 
fault.  Used  in  perturbation  testing  to  determine 
whether  particular  potential  faults  may  go 
undetected  by  a  given  test  path. 

PERTURBATION  TESTING:  A  test  path  ade¬ 
quacy  measurement  technique  that  proposes 
using  the  reduction  of  the  space  of  undetectable 
faults  as  a  criterion  for  test  path  selection  and  is 
intended  to  reveal  faults  in  arithmetic  expres¬ 
sions. 

PETRI  NET:  A  method  of  analyzing  state  transi¬ 
tions. 

PIECE-WISE  EXPONENTIALLY  DISTRI¬ 
BUTED:  Applied  to  the  distribution  of  the  execu¬ 
tion  time  between  failures  means  that  the  hazard 
rate  is  a  constant  that  changes  only  at  each  error 
correction. 

PORTABILITY:  The  ease  with  which  software 
can  be  transferred  from  one  computer  system  or 
environment  to  another.  [IEEES3] 

POST- ASSERTION:  An  assertion  attached  to 
the  end  of  a  program  being  verified  which  is 
expected  to  be  satisfied  whenever  execution 
passes  this  point. 

POTENTIALITY:  A  program  has  the  potential  to 
establish  predicate  R  in  the  computation  started 
in  state  S  if  and  only  if  there  exists  a  finite 
sequence  r  such  that  r  is  an  initial  section  of  same 
sequence  in  the  computation  history  of  state  S 
and  R  bolds  in  the  last  state  of  r. 

PRECISE  INTERFACE  CONTROL:  An 
approach  to  interface  analysis  which  uses  requisi¬ 
tion  of  access  and  provision  of  access  concepts  to 
extend  the  traditional  notion  of  visibility. 

PREDICATE:  A  logical  formula  involving  var¬ 
iables/constants  known  to  a  module.  [MillSl] 

PREDICATE  CALCULUS:  A  first-order 
language  in  which  one  can  make  general  state¬ 
ments  about  all  objects  in  a  fixed  set  called  the 
universe.  The  formulae  in  this  language  are  con¬ 
structed  out  of  names  for  relations  and  names  for 
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m<Uvidual  objects  in  the  universe. 

PREDICATE  INTERPRETAHON:  A  constraint 
equivalent  to  a  program  predicate  where  program 
^uiables  are  replaced  by  their  symbolic  values  in 
terms  of  input  variables. 

PREDICATE  TRANSFORMER:  A  function  that 
maps  an  assertion  and  a  syntactic  unit  into 
another  assertion. 

PREDICTIVE  ASSESSMENT:  The  process  of 
using  a  predictor  metric(s)  to  predict  the  values 
of  another  metric.  [IEEE88] 

PREDICTIVE  METRIC:  A  metric  which  is  used 
to  predict  the  values  of  another  metric.  [IEEE88] 

PROBABILITY:  The  fraction  of  occasions  on 
which  a  specified  value  or  set  of  values  of  a  quan¬ 
tity  occurs,  out  of  all  possible  values  for  that 
quantity.  [Musa87] 

PROBABILITY  DENSITY:  Probability  per  unit 
variation  of  random  variable.  [MusaS7] 

PROBABILITY  DISTRIBUTION:  The  set  of 
probabilities  corresponding  to  the  values  that  a 
random  variable  can  take  on.  [Musa87] 

PROCEDURE  SUBDOMAIN:  A  partition  of  a 
program’s  input  data  such  that  the  elements  of 
the  subdomain  are  treated  uniformly  by  the 
specification  and  processed  uniformly  by  the 
implementation.  Used  in  Partition  Analysis. 

PROCESS:  In  a  computer  system,  a  unique, 
finite  course  of  events  defined  by  its  purpose  or 
by  Its  effect,  achieved  under  given  conditions. 
[IEEE83] 

PROCESS  AUGMENTED  FLOWGRAPH;  An 
annotated  graphical  representation  of  communi¬ 
cating  concurrent  processes  formed  by  connect¬ 
ing  the  graphs  representing  the  individual 
processes  with  special  edges  indicating  all  syn¬ 
chronization  constraints. 

PROCESS  CONTROL  SYSTEM:  A  system 
embedded  in  some  larger  system  that  interacts 
with  external  devices  or  objects  to  control 


ongoing  external  processes. 

PROCESS  STEP:  Any  task  performed  in  the 
development,  implementation  or  maintenance  of 
software  (e.g.,  identify  the  software  components 
of  a  system  as  part  of  Ae  design).  [IEEE88] 

PROCESS  METRIC:  Metric  used  to  measure 
characteristics  of  the  methods,  techniques,  and 
tools  employed  in  acquiring,  developing,  verify¬ 
ing,  and  operating  the  software  system.  [IEEE88] 

PROCESS  PROGRAMMING:  The  specification 
of  software  development  processes  in  a  pro¬ 
cedural  manner  (for  example,  a  programming 
language)  which  serves  to  formalize  and  commun¬ 
icate  these  processes,  facilitate  their  analysis,  and 
define  the  necessary  interactions  and  interfaces 
between  automated  and  manual  actions. 

PROFILE:  A  compendium  of  information  which 
contributes  to  the  definition  of  an  environment. 
[DACS79] 

PRODUCT  METRIC:  Metric  used  to  measure 
the  characteristics  of  the  documentation  and 
code.  [lEEESS] 

PROGRAM:  See  Module. 

PROGRAM  BLOCK:  In  problem-oriented 
languages,  a  computer  program  subdivision  that 
serves  to  group  related  statements,  delimit  rou¬ 
tines,  specify  storage  allocation,  delineate  the 
applicability  of  labels,  or  segment  paths  of  the 
computer  program  for  other  purposes. 

PROGRAM  COMPONENT:  See  COMPONENT. 

PROGRAM  COUNTER:  A  variable  which  indi¬ 
cates  the  program  statement  currently  being  exe¬ 
cuted. 

PROGRAM  CORRECTNESS:  (1)  The  extent  to 
which  software  is  free  from  design  defects  and 
coding  defects;  that  is,  fault  free.  [IEEE83].  (2) 
Extent  to  which  the  software  satisfies  its 
specifications  and  fulfills  the  user’s  mission 
objects.  [RADC83].  (3)  If  for  all  initial  states  that 
belong  to  the  set  of  legitimate  initial  states,  the 
program  P  terminates  with  a  final  state  that 
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belongs  to  the  set  of  legitimate  final  states,  then 
program  P  exhibits  program  correctness. 

PROGRAM  DEBUGGING:  See  DEBUGGING. 

PROGRAM  GRAPH:  Graphical  representation 
of  a  program.  [AdriS2] 

PROGRAM  INSTRUMENTATION:  (1)  Probes, 
such  as  instructions  or  assertions,  inserted  into  a 
computer  program  to  facilitate  execution  moni¬ 
toring,  proof  of  correctness,  resource  monitor- 
ii^,  or  other  activities.  (2)  The  process  of  prepar¬ 
ing  and  inserting  probes  into  a  computer  pro¬ 
gram.  [IEEE83] 

PROGRAM  PATH:  See  PATH. 

PROGRAM  PREDICATE:  See  PREDICATE. 

PROGRAM  PROVING:  The  act  of  demonstrat¬ 
ing  that  a  program  is  correct. 

PROGRAM  SPECDTCATION:  The  formaliza¬ 
tion  that  precisely  states  the  requirements  and 
objectives  which  the  program  is  to  satisfy. 

PROGRAM  TESTING:  See  TESTING. 

PROGRAM  TEXT:  The  set  of  statements,  exe¬ 
cutable  and  non-executable,  which  make  up  a 
module.  Program  text  is  expressed  in  a  program¬ 
ming  language.  [MillSl] 

PROGRAM  TRACE:  A  record  of  the  execution 
of  a  computer  program;  it  states  the  sequence  in 
which  the  instructions  were  executed. 

PROGRAM  TRANSFORMATION:  To  replace 
one  segment  of  a  program  description  by  another, 
equivalent  description.  [DACS79] 

PROGRAM  VERIFICATION:  The  act  of  demon¬ 
strating  that  a  program  achieves  some  intended 
purpose. 

PROGRAM  UNIT  INVOCATION:  See  INVOCA¬ 
TION. 

PROGRAMMING  LANGUAGE:  An  artificial 
language  designed  to  generate  or  express 


programs.  [1EEE83] 

PROOF:  A  structure  of  valid  applications  of 
inference  rules  to  obtain  a  conclusion  (proof). 

PROOF  JUSTIFICATION:  Concerns  establish¬ 
ing,  in  a  precise  algorithmic  notation,  the  reason¬ 
ing  required  to  determine  the  validity  of  the  asser¬ 
tions. 

PROOF  CHECKER:  A  program  that  checks  for¬ 
mal  proofs  of  program  properties  for  logical 
correctness. 

PROOF  OF  CORRECTNESS:  A  formal  tech¬ 
nique  used  to  prove  mathematically  that  a  pro¬ 
gram  satisfies  its  specifications.  [IEEES3] 

PROOFS  OF  PROGRAMS:  See  PROOF  OF 
CORRECTNESS. 

PROOF  OF  SOUNDNESS:  Proof  that  aU  state¬ 
ments  in  the  theory  that  are  derived  from 
theorems  (true  statements)  by  rules  of  inference 
of  the  theory  must  be  true. 

PROOF  RULES:  The  inference  rules  that  permit 
the  derivation  of  more  complex  theorems,  includ¬ 
ing  theorems  about  the  semantics  of  a  complete 
program. 

PROTOTYPE:  A  limited  implementation  of  a 
system  built  in  order  to  caphire  or  validate  some 
aspects  of  a  system  design.  The  fundamental  con¬ 
cept  is  that  a  prototype  of  a  system  is  more  chea¬ 
ply  or  more  quickly  constructed  than  the  actual 
system.  Hence,  some  aspects  of  function  or  exe¬ 
cution  speeds  are  typically  sacrificed. 

PROTOTYPING:  A  discipline  of  system  design 
where  the  function  of  the  actual  system  is  cap¬ 
tured  in  a  series  of  increasingly  accurate  proto¬ 
types. 

PROVISION  OF  ACCESS:  In  the  AdaPIC  sys¬ 
tem,  provision  of  access  occurs  when  a  entity 
grants  the  right  of  reference,  or  use,  to  some  set 
of  entities. 

PURIFICATION  DEGREE:  The  ratio  of  change 
in  the  hazard  rate  function  from  the  beginning  of 
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testing  to  the  end  versus  what  it  was  at  the  begin¬ 
ning. 


QUANTIFIER-FREE  FORMULAE:  Formulae  in 
which  there  are  no  operations  that  bind  the  vari¬ 
ables  in  a  logical  formula  by  specifying  their  quan¬ 
tity. 

QUALIFICATION  TESTING:  Formal  testing, 
usually  conducted  by  the  developer  for  the  custo¬ 
mer,  to  demonstrate  that  the  software  meets  its 
specified  requirements.  See  also  SYSTEM  TEST¬ 
ING. 

QUALITY:  The  d^ee  to  which  a  program 
possesses  a  desired  combination  of  attributes  that 
enable  it  to  perform  its  specified  end  use. 

QUALITY  ASSURANCE:  A  planned  and  sys¬ 
tematic  pattern  of  all  actions  necessary  to  provide 
adequate  confidence  that  the  item  or  product 
conforms  to  established  technical  requirements. 
[IEEE83] 

QUALITY  ASSURANCE  METHOD:  An 
approach  for  reducing  the  risk  associated  with  a 
software  system  and  one  or  more  properties. 

QUALITY  ATTRIBUTE:  A  characteristic  of 
software;  a  generic  term  applying  to  factors,  sub¬ 
factors,  or  metric  values.  [IEEE88] 

QUALITY  SUB-FACTOR:  A  decomposition  of  a 
quality  factor  or  quality  sub-factor.  [IEEE88] 

QUALITY  FACTOR:  An  attribute  of  software 
that  contributes  to  its  quality.  [IEEE88] 

QUALITY  REQUIREMENT:  A  requirement  that 
a  software  attribute  be  present  in  software  to 
satisfy  a  contract,  standard,  specification,  or 
other  formally  imposed  document.  [IEEE88] 

RANDOM:  Possessing  the  property  of  having 
more  than  one  value  at  one  time,  each  occurring 
with  some  probability.  [Musa87] 

RANDOM  TESTING:  An  essentially  black-box 
testing  approach  in  which  a  program  is  tested  by 


randomly  choosing  a  subset  of  all  possible  input 
values.  The  distribution  may  be  arbitrary  or  may 
attempt  to  accurately  reflect  the  distribution  of 
inputs  in  the  application  environment. 

RANDOM  VARIABLE:  A  variable  that 
possesses  the  property  of  randomness  {see  RAN¬ 
DOM).  [Musa87] 

REAL  TIME  CONSTRAINTS:  Those  constraints 
imposed  by  the  environment  in  which  the  system 
is  going  to  operate. 

REAL-TIME:  Pertaining  to  the  processing  of 
data  by  a  computer  in  connection  with  another 
process  outside  the  computer  according  to  time 
requirements  imposed  by  the  outside  process. 
This  term  is  also  used  to  describe  systems  operat¬ 
ing  in  conversational  mode,  and  processes  that 
can  be  influenced  by  human  intervention  while 
they  are  in  progress.  [lEEESS] 

REASONING  SYSTEMS:  Systems  capable  of 
performing  the  deduction  of  logical  expressions 
from  other  logical  expressions. 

RECONFIGURATION:  Adjustment  of  the  rela¬ 
tionships  between  the  software  modules  in  a 
software  system  or  hardware  devices  in  a 
hardware  system. 

RECOVERY  BLOCK:  Software  fault  tolerance 
mechanism.  A  recovery  block  consists  of  a  con¬ 
ventional  [program]  block  which  is  provided  with 
a  means  of  error  detection  (an  acceptance  test) 
and  zero  or  more  stand-by  spares  (the  additional 
alternates).  [Rand75]. 

RECURSION:  An  initial  condition  is  defined, 
and  the  transformation  from  one  condition  to  the 
next  is  defined  in  terms  of  the  previously  defined 
conditions. 

RECURSION  INDUCTION:  To  prove  that  g=h 
(a)  show  that  g  and  h  satisfy  the  defining  equation 
for  some  other  function  /,  and  (b)  show  that  / 
holds  over  the  domain  of  interest. 

RECURSION  THEOREM:  A  theorem  about 
primitive  and  partial  recursive  functions  due  to 
Kleene. 
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RECURSIVE  FUNCTION  THEORY:  Each 
recursive  function  is  defined  by  combining  some 
initial  functions  using  composition,  recursion, 
and  minimalization. 

RECURSIVE  PROGRAMS:  Those  programs 
that  have  or  use  recursive  procedures  or  func¬ 
tions. 

REFERENCE  ANALYSIS:  A  form  of  static  error 
analysis  which  can  detect  reference  anomalies; 
for  example,  when  a  variable  is  referenced  along 
a  program  path  before  it  is  assigned  a  value  along 
that  path. 

REGRESSION  TESTING:  Selective  retesting  to 
detect  faults  introduced  during  modification  of  a 
system  or  system  component,  to  verify  that 
modifications  have  not  caused  unintended 
adverse  effects,  or  verify  that  a  modified  system 
or  system  component  still  meets  its  specified 
requirements.  [IEEE83] 

REGULARITY  HYPOTHESIS:  The  regularity 
hypothesis  for  a  level  n  consists  in  assuming  that 
if  the  test  is  successful  for  data  of  complexity  less 
than  n,  then  the  program  behaves  correctly  for 
any  value. 

RELAY:  A  fault-based  test  data  selection  tech¬ 
nique  based  on  defining  revealing  conditions  that 
guarantee  that  a  fault  originates  failure  during 
execution  and  that  the  failure  transfers  through 
computations  and  data  until  it  is  revealed. 

RELIABILITY:  (1)  The  probability  that  the 
software  will  perform  as  intended  under  stated 
conditions  for  a  specified  period  of  time. 
[DeMiSS]  (2)  The  probability  that  the  software 
will  perform  its  logical  operations  in  the  specified 
environment  without  failure.  [RADC83] 

RELIABILITY  ASSESSMENT:  The  process  of 
determining  the  achieved  level  of  reliability  of  an 
existing  system  or  system  component. 

ROJABILITY  MODEL:  A  model  used  for 
predicting,  estimating,  or  assessing  reliability. 
[IEEE83] 

RELIABILITY  GROWTH  MODEL:  A  reliability 


model  which  takes  account  of  improvements  in 
reliability  that  result  from  correcti^  faults  in  the 
software. 

RELIABLE  TEST  DATA:  A  set  of  test  data  T  is 
reliable  for  program  P  if  it  reveals  that  P  contains 
an  fault  whenever  P  is  incorrect. 

RELIABLE  TEST  DATA  SELECTION  STRA¬ 
TEGY:  A  test  data  selection  strategy  is  reliable  if 
it  guarantees  to  generate  test  data  capable  of 
detecting  every  fault  in  a  program.  ' 

RENDEZVOUS:  The  interaction  that  occurs 
between  two  parallel  tasks  when  one  task  has 
called  an  entry  of  the  other  task,  and  a 
corresponding  accept  statement  is  being  executed 
by  the  other  task  on  behalf  of  the  calling  task. 
[IEEE83] 

REQUIREMENT:  A  condition  or  capability  that 
must  be  met  or  possessed  by  a  system  or  system 
component  to  satisfy  a  contract,  standard, 
specification,  or  other  formally  imposed  docu¬ 
ment.  The  set  of  all  requirements  forms  the  basis 
for  subsequent  development  of  the  system  or  sys¬ 
tem  component.  [IEEE83] 

REQUIREMENTS  LANGUAGE:  A  language 
used  to  provide  a  succinct  and  unambiguous 
specification  of  the  required  system  capabilities. 

REQUIREMENTS  SPECIFICATION:  A 

specification  that  sets  forth  the  requirements  for 
a  system  or  system  component;  for  example,  a 
software  configuration  item.  lypically  included 
are  functional  requirements,  performance 
requirements,  interface  requirements,  design 
requirements,  and  development  standards. 
[IEEE83] 

REQUISITION  OF  ACCESS:  In  the  AdaPIC  sys¬ 
tem,  requisition  of  access  occurs  when  an  entity 
.equests  the  right  to  refer  to,  or  make  use  of, 
some  set  of  entities. 

RESOURCE  ASSIGNMENT:  The  process  of 
granting  the  request  for  a  resource  by  a  task. 

RETESTING:  See  REGRESSION  TESTING. 
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REUSABILITY:  The  effort  to  convert  a  software 
component  for  use  in  another  application. 
[RAbCffl]. 

REUSABLE  THEORIES:  Formal  reasoning  rules 
that  can  be  used  by  more  than  one  system. 

REVEALING  SUBDOMAIN:  A  subset  of  a 
program’s  input  domain  is  revealing  if  the 
existence  of  one  incorrectly  processed  input 
implies  that  all  of  the  subset’s  elements  are  pro¬ 
cessed  incorrectly.  Conversely,  if  one  input  is  pro¬ 
cessed  correctly,  all  elements  in  the  subdomain 
are  processed  correctly. 

ROBUSTNESS:  The  extent  to  which  software  can 
continue  to  operate  correctly  despite  the  intro¬ 
duction  of  invalid  inputs.  [IEEE83] 

RUN-TIME:  The  instant  at  which  a  program 
begins  to  execute.  [IEEE83] 

RUN-TIME  ENVIRONMENT:  The  environment 
in  which  a  program  executes,  either  the  host  or 
target  environment. 

RUN-TIME  SYSTEM:  A  set  of  software  routines 
added  to  a  compiled  program,  typically  at  link 
time,  to  implement  the  semantics  intended  by  the 
compiler. 

RUN-TIME  SCHEDULER:  Software  which  allo¬ 
cates  processing  resource  to  parallel  tasks. 

SAFE  SYSTEM:  A  system  which  prevents  unsafe 
states  from  producing  safety  failures. 

SAFETY :  The  extent  to  which  the  program  is  pro¬ 
tected  from  exposure  to  a  specified  set  of 
hazards.  [DeMiSS] 

SAFETY  ANALYSIS:  Identification  of  the  possi¬ 
ble  causes,  and  evaluation  of  the  possible  conse¬ 
quences,  of  critical  system  failures.  Intended  to 
determine  the  necessary  fault  tolerance  or  other 
mechanisms  needed  to  ensure  safe  operation  in 
the  system  under  various  operating  conditions 
and  modes. 

SAFETY  FAILURE:  A  failure  which  leads  to 
casualties  or  otherwise  serious  consequences. 


SAFETY  PROPERTY:  A  program  property  that 
is  satisfied  if  conditions  or  actions  that  should 
never  happen  within  a  program  never  occur. 

SATISFIABILITY:  Concerns  the  existence  of  an 
interpretation  that  satisfies  the  verification  condi¬ 
tions  in  a  proof  of  correctness. 

SCHEDULER:  A  computer  program  which  allo¬ 
cates  resources  to  waitii^  processes  to  allow 
them  to  execute  in  an  efficient  or  prioritized 
manner. 

SCHEDULING  ALGORITHM:  A  set  of  rules 
used  to  determine  how  available  processing 
resources  should  be  allocated  to  parallel  tasks 
based  on  priorities  of  the  tasks. 

SCOPE:  The  range  within  which  an  identified 
unit  displays  itself.  Scope  of  activity  refers  to  the 
boundaries  within  which  a  data  structure  or  pro¬ 
gram  element  remains  an  integral  unit.  Scope  of 
control  refers  to  the  submodules  in  a  program 
that  potentially  may  execute  if  control  is  given  to 
a  cited  module.  Scope  of  error  denotes  the  set  of 
submodules  that  are  potentially  affected  by  the 
detection  of  a  fault  in  a  cited  module.  [DACS79] 

SECURITY:  The  extent  to  which  computer 
hardware,  software,  and  resident  information  and 
data  are  protected  from  specified  threats  such  as 
unauthorized  access,  use,  modification,  destruc¬ 
tion,  transmission,  or  disclosure.  [DeMiSS] 

SEGMENT:  A  (logical)  segment  or  decision-to- 
decision  path,  is  the  set  of  statements  in  a  module 
which  are  executed  as  a  result  of  the  evaluation  of 
some  predicate  (conditional)  within  the  module. 
It  begins  at  an  entry  or  decision  statement  and 
ends  at  a  decision  statement  or  exit,  and  should 
be  thought  of  as  including  the  sensing  of  the  out¬ 
come  of  a  conditional  operation  and  the  subse¬ 
quent  statement  execution  up  to  and  including  the 
computation  of  the  next  predicate  value,  but  not 
including  its  evaluation.  [MillSl] 

SELF-CHECKING  SOFTWARE:  Software 
which  makes  an  explicit  attempt  to  determine  its 
own  correctness  and  to  proceed  accordingly. 

SEMANTICS:  (1)  The  relationship  of  characters 
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or  groups  of  characters  to  their  meanings, 
independent  of  the  manner  of  their  interpretation 
and  use.  (2)  The  relationships  between  symbols 
and  their  meanings.  [TF.RR83] 

^MANTIC  CHARACTERIZATION:  Determin¬ 
ing  the  approach  used  in  the  formal  definition  of 
the  semantics  of  programming  language  con¬ 
structs. 

SEMANTIC  VALUATION  FUNCTIONS: 
Semantic  valuation  functions  map  programming 
constructs  to  the  values  (numbers,  truth  values, 
function,  and  so  on)  that  they  denote. 

SENSITIVITY  ANALYSIS:  In  safety  analysis, 
that  analysis  which  assesses  the  potential  impact 
of  a  potentially  critical  failure  on  the  ability  of  the 
system  to  perform  its  mission. 

sensitivit  y  FOCUS:  In  the  context  of  regres¬ 
sion  testing,  the  concern  that  the  amount  of 
retesting  required  after  a  software  change  is  pro¬ 
portional  to  the  extent  of  that  change. 

SEQUENTIAL  PROCESSES:  Processes  that  exe¬ 
cute  in  such  a  manner  that  one  must  finish  before 
the  next  begins.  [IEEE83] 

SEQUENTIAL  PROOF:  A  formal  proof  made 
for  sequential  processes. 

SHARED  VARIABLE;  A  variable  shared  by 
more  than  one  process. 

SHARED  VARIABLE  COMMUNICATION:  A 
variable  shared  by  more  than  one  process  and 
used  to  communicate  between  processes. 

SIDE  EFFECT:  Processing  or  activities  per¬ 
formed,  or  results  obtained,  secondary  to  the  pri¬ 
mary  function  of  a  program,  subprogram,  or 
operation.  [IEEES3] 

SIMULATION:  Use  of  an  executable  model  to 
represent  behavior  of  an  object.  The  computa¬ 
tional  hardware,  external  environment,  and  even 
code  segments  may  be  simulated.  [Adri82] 

SOFTWARE:  (1)  Computer  programs,  pro¬ 
cedures,  rules,  and  possibly  associated 


documentation  and  data  pertaining  to  the  opera¬ 
tion  of  a  computer  sj'stem.  (2)  Programs,  pro¬ 
cedures,  rules,  and  any  associate  documentation 
pertaining  to  the  operation  of  a  computer  system. 
Contrast  with  HARDmRE.  [IEEES3] 

SOFTWARE  ASSURANCE:  See  QUALFTY 
ASSURANCE. 

SOFTWARE  COMPONENT:  General  term  used 
to  refer  to  an  element  of  a  software  system,  such 
as  module,  unit,  etc.  [IEEE88] 

SOFTWARE  ENGINEERING:  The  systematic 
approach  to  the  development,  operation,  mainte¬ 
nance,  and  retirement  of  software.  [IEEE83] 

SOFTWARE  FAULT  TREE  ANALYSIS:  A  form 
of  fault  tree  analysis  used  for  analyzing  the  safety 
of  software  designs  or  code. 

SOFTWARE  LIFE  CYCLE:  The  period  of  time 
that  starts  when  a  software  product  is  conceived 
and  ends  when  the  product  is  no  longer  available 
for  use.  The  software  life  cycle  typically  includes 
a  requirements  phase,  design  phase,  implementa¬ 
tion  phase,  test  phase,  installation  and  checkout 
phase,  operation  and  maintenance  phase,  and 
sometimes,  retirement  phase.  [IEEES3] 

SOFTWARE  PRODUCT:  A  software  entity 
designated  for  delivery  to  a  user.  [IEEES3] 

SOFTWARE  QUALITY:  (1)  The  totaUty  of 
features  and  characteristics  of  a  software  product 
that  bear  on  its  ability  to  satisfy  given  needs;  for 
example,  conform  to  specifications.  (2)  The 
degree  to  which  software  possesses  a  desired 
combination  of  attributes.  (3)  The  degree  to 
which  a  customer  or  user  perceives  that  software 
meets  his  or  her  composite  expectations.  (4)  The 
composite  characteristics  of  software  that  deter¬ 
mine  the  degree  to  which  the  software  in  use  will 
meet  the  expectations  of  the  customer.  [IEEE83] 

SOFTWARE  QUALIFY  INDICATORS:  Process 
guidelines  in  the  form  of  detailed  data,  derived 
from  scheduled  surveys,  inspections,  evaluations, 
and  tests,  that  provide  insist  into  the  condition 
of  a  product  or  process. 
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SOFTWARE  QUALITY  METRIC:  A  function 
whose  inputs  are  software  data  and  whose  output 
is  a  single  (numerical)  value  that  can  be  inter¬ 
preted  as  the  degree  to  which  software  possesses 
a  given  attribute  that  affects  its  quality.  [1EEE88] 

SOFTWARE  RELIABILIIY:  (1)  The  probability 
that  software  will  not  cause  the  failure  of  the  sys¬ 
tem  for  a  specified  time  under  specified  condi¬ 
tions.  The  probability  is  a  function  of  the  inputs 
to  and  use  of  the  system  as  well  as  a  function  of 
the  existence  of  faidts  in  the  software.  The  inputs 
to  the  system  determine  whether  existing  faults,  if 
any,  are  encountered.  (2)  The  ability  of  a  program 
to  perform  a  required  fimction  under  stated  con¬ 
ditions  for  a  stated  period  of  time.  [IEF.ES3] 

SOFTWARE  RELIABILITY  MODEL:  See 
RELIABILITY  MODEL. 

SOFTWARE  REQUIREMENT:  See  REQUIRE¬ 
MENT. 

SOFTWARE  TOOL:  A  computer  program  used 
to  help  develop,  test,  analyze,  or  maintain 
another  computer  program  or  its  documentation; 
for  example,  automated  design  tool,  compiler, 
test  tool,  maintenance  tool.  [IEEE83] 

SOURCE  LANGUAGE:  (1)  A  language  used  to 
write  source  programs.  (2)  A  language  from 
which  statements  are  translated.  [IEEE83] 

SPECIAL:  The  state-machine  specification 
language  employed  by  the  Efierarchical  Develop¬ 
ment  Methodology  verification  system. 

SPECIAL  VALUES:  Special  values  have  special 
mathematical  properties;  for  example,  zero,  one, 
a  very  small  value,  a  very  large  value. 

SPECIAL  VALUES  TESTING:  Testing  to  ensure 
proper  handling  of  all  special  values. 

SPECIFICATION:  A  document  that  prescribes 
in  a  complete,  precise,  verifiable  manner,  the 
requirements,  design,  behavior,  or  other  charac¬ 
teristics  of  a  system  or  system  component. 
[IEEE83] 

SPECmCATION  LANGUAGE:  A  language. 


often  a  machine-processable  combination  of 
natural  and  formal  lai^t>asc>  used  to  specify  the 
requirements,  design,  behavior,  or  other  charac¬ 
teristics  of  a  system  or  system  component. 
[IREE83] 

SPECIFICATION  MODEL:  A  model  used  to 
give  a  formal  specification  of  a  program. 

SPECIFICATION  MUTATION:  A  form  of  muta¬ 
tion  testing  which  is  applied  to  program 
specifications  to  determine  the  absence  or  pres¬ 
ence  of  a  predefined  set  of  potential  faults  in  the 
implementation  of  the  specification. 

STACK  FRAMES:  A  stack  element  of  a  push¬ 
down  stack  automaton. 

STANFORD  PASCAL  VERIFIER:  A  tool  which 
reasons  in  quantifier-free  first-order  predicate  cal¬ 
culus. 

STARVATION  FREEDOM:  Occurs  when  a  pro¬ 
cess  (which  is  not  blocked  or  deadlocked)  caimot 
get  into  a  state  where  a  request  for  a  resource  will 
never  be  granted. 

STATE  TRANSITION:  A  change  from  one  pro¬ 
gram  state  to  another. 

STATE  TRANSITION  COVERAGE:  Member  of 
a  series  of  successively  more  stringent  testing  cov¬ 
erage  measures  for  concurrent  programs  analo¬ 
gous  to  structural  and  data  flow  testing  criteria  for 
sequential  programs.  See  also  CONCURRENCY 
STATE  COVERAGE,  SYNCHRONIZATION 
COVERAGE. 

STATE-MACHINE  LANGUAGE:  A  language 
accepted  by  a  finite  state  automaton. 

STATE-MACHINE  SPECIFICATION:  Defines  a 
set  of  functions  that  specify  transformations  on 
input.  The  set  of  functions  may  be  viewed  as 
defining  the  nature  of  the  abstract  data  type  or 
describing  the  behavior  of  an  abstract  machine. 

STATEMENT  COMPLEXITY:  A  complexity 
value  assigned  to  each  statement  which  is  based 
on  (1)  the  statement  type,  and  (2)  the  total  length 
of  postfix  representations  of  expressions  within 
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the  statement  (if  any).  These  values  are  intended 
to  represent  a  statement’s  potential  execution 
time.[Mill81] 

STATEMENT  TESTING:  A  test  method  satisfy¬ 
ing  the  coverage  criterion  that  each  statement  of  a 
program  be  executed  at  least  once  during  pro¬ 
gram  testing.  [Adri82] 

STATIC  ANALYSIS:  The  process  of  evaluating  a 
program  without  executing  the  program. 
[IEEE83] 

STATIC  ANALYZER:  A  software  tool  that  aids 
in  the  evaluation  of  a  computer  program  without 
executing  the  program.  Examples  include  syntax 
checkers,  compilers,  cross-reference  generators, 
standards  enforcers,  and  flowcharters.  [IEEE83] 

STATIC  BINDING:  Binding  performed  prior  to 
execution  of  a  program  and 
not  subject  to  change  during  execution.  Contrast 
with  DYNAMIC  BINDING. 

STATIC  CONCURRENCY  ANALYSIS:  A  tech¬ 
nique  for  determinii^  all  the  possible  synchroni¬ 
zation  patterns  in  a  concurrent  program,  without 
program  execution. 

STATIC  ERROR  ANALYSIS:  Analysis  of  a  pro¬ 
gram  to  determine  whether  certain  kinds  of  faults 
or  dangerous  conditions  are  present.  See  TYPE 
AND  UNITS  ANALYSIS,  REFERENCE 
ANALYSIS,  EXPRESSION  ANALYSIS,  INTER¬ 
FACE  ANALYSIS. 

STATICALLY  LINKED:  See  STATIC  BINDING. 

STATISTICAL  TESTING:  A  testing  approach 
which  employs  the  probability  distributions  of  the 
product  inputs  and  randomized  sampling  tech¬ 
niques  to  organize  test  material.  The  randomiza¬ 
tion  supports  statistical  inferences  about  the 
product’s  operational  characteristics  and  an  esti¬ 
mate  of  its  npected  reliability  (MTTF). 

STRESS  TESTING;  See  BOUNDARY  VALUE 
ANALYSIS. 

STRONG  MUTATION  TESTING:  See  MUTA¬ 
TIONTESTING. 


STRONG  TYPING:  A  programming  language 
feature  that  requires  the  data  type  of  each  data 
object  to  be  declared,  and  that  precludes  the 
application  of  operators  to  inappropriate  data 
objects  and,  thereby,  prevents  the  interaction  of 
data  objects  of  incompatible  types.  [IEEE83] 

STRUCTURAL  COVERAGE  MEASURE:  A 
measure  of  the  structural  coverage  accomplished 
during  testing  activities.  Usually  given  as  the  per¬ 
centage  of  program  statements,  branches,  or 
paths  which  have  been  executed. 

STRUCTURAL  INDUCTION:  A  formal  proof 
method  using  recursive  induction  upon  the  struc¬ 
ture  of  the  data  manipulated  by  a  program. 

STRUCTURAL  TESTING:  A  testing  method 
where  the  test  data  are  derived  solely  from  the 
program  strucmre.  [Adri82] 

STRUCTURED  PROGRAMMING:  (1)  A  well- 
defined  software  development  technique  that 
incorporates  top-down  design  and  implementa¬ 
tion  and  strict  use  of  structured  program  control 
constructs.  (2)  Loosely,  any  technique  for  organ¬ 
izing  and  cod^g  programs  that  reduces  complex¬ 
ity,  improves  clarity,  and  facilitates  debugging  and 
modification.  [XE£]^] 

STRUCTURED  WALKTHROUGH:  See  WALK¬ 
THROUGH. 

STUB:  Special  code  segments  that,  when  invoked 
by  a  code  segment  under  test,  will  simulate  the 
behavior  of  designed  and  specified  modules  not 
yet  constructed.  [Adri82] 

STUB  ANALYSIS:  In  the  AdaPIC  system,  stub 
analysis  checks  the  consistency  of  multiple  views 
of  the  same  stub,  and  the  consistency  of  each  of 
these  views  against  some  authorized  specification 
of  that  module. 

SUBGOAL  INDUCTION:  A  proof  method  that  is 
applicable  to  while  statements  as  the  output  struc¬ 
ture,  and  assumes  that  the  functional  abstraction 
of  the  loop  body  is  available. 

SUBSYSTEM:  A  group  of  assemblies  or  com¬ 
ponents  or  both  combined  to  perform  a  single 
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function.  [IEEES3}. 

SUPERFLUOUS  CODE  ERROR:  An  error 
which  occurs  when  a  program  contains  code 
which  is  never  executed  or  is  redundant  for  some 
reason. 

SURVTV ABILITY:  (1)  The  probability  that  the 
software  will  perform  and  support  critical  func¬ 
tions  in  its  intended  environment  without  failure 
when  a  specified  portion  of  the  system  is  inoper¬ 
able.  [DeMiSS].  (2)  The  probability  that  the 
software  will  continue  to  peiform  or  support  criti¬ 
cal  functions  when  a  portion  of  the  system  is 
inoperable.  [RADC83]. 

SYMBOL  CROSS-REFERENCER:  A  software 
tool  that  produce  dictionaries  relating  the  sym¬ 
bols  used  in  a  program  by  logical  name. 

SYMBOLIC  ALTERNATIVE:  Used  in  a 
modified  form  of  symbolic  execution  to 
prepresent  the  effect  of  several  mutation  transfor¬ 
mations. 

SYMBOLIC  DATA:  Symbols  used  to  represent 
actual  input  data. 

SYMBOLIC  DEBUGGING:  The  process  of  exa¬ 
mining  a  path  computation  and  path  domain  in 
order  to  obtain  information  about  the  cause  of  a 
known  fault. 

SYMBOLIC  EVALUATION:  See  SYMBOLIC 
EXECUTION. 

SYMBOLIC  EVALUATION  SYSTEM:  A 
software  tool  that  accepts  symbolic  values  for 
some  of  the  program  inputs  and  algebraically 
manipulates  these  symbols  according  to  the 
expressions  in  which  they  appear.  It  can  be  used 
to  support  test  data  generation,  assertion  check¬ 
ing,  path  analysis,  and  detection  of  data  fiow 
anomalies. 

SYMBOLIC  EXECUTION:  A  verification  tech¬ 
nique  in  which  program  execution  is  simulated 
using  symbols  rather  than  actual  values  for  input 
data,  and  program  outputs  are  expressed  as  logi¬ 
cal  or  mathematical  expressions  involving  these 
symbols.  [IEEE83] 


SYMBOLIC  INPUTS:  See  SYMBOLIC  DATA. 

SYMBOUC  INTERPRETATION:  Where  the 
values  taken  on  by  variables  are  represented  as 
algebraic  expressions  that  denote  the  computa¬ 
tional  history  of  those  variables. 

SYMBOLIC  TESTING:  A  method  of  examining 
the  path  computation  and  path  condition  to 
ascertain  the  correctness  of  a  program  path. 

SYMBOLIC  VALUES:  Values  which  are  main¬ 
tained  as  algebraic  expressions  given  in  terms  of 
the  symbolic  names  assigned  to  input  values. 

SYNCHRONIZATION:  The  exchange  of  signals 
used  when  certain  processes  must  be  stopped  at  a 
given  point  until  some  event  under  the  control  of 
another  process  has  occurred. 

SYNCHRONIZATION  COVERAGE:  Member  of 
a  series  of  successively  more  stringent  testing  cov¬ 
erage  measures  for  concurrent  programs  analo¬ 
gous  to  structural  and  data  flow  testing  criteria  for 
sequential  programs.  See  also  CONCURRENCY 
STATE  COVERAGE,  STATE  TRANSITION 
COVERAGE. 

SYNCHRONIZATION  FAULT:  A  fault  which 
results  from  incorrect  sequencing  and  communi¬ 
cations  between  concurrent  processes. 

SYNTACTIC  UNIT:  A  unit  that  corresponds  to  a 
set  of  statements  in  a  program  which  define  an 
operation  upon  some  object. 

SYNTAX:  (1)  The  relationship  among  characters 
or  groups  of  characters,  independent  of  their 
meanings  or  the  manner  of  their  interpretation 
and  use.  (2)  The  strucmre  of  expressions  in  a 
language.  (3)  The  rules  governing  the  structiue  of 
a  language.  See  also  SEMANTICS.  [IEEE83] 

SYSTEM:  A  collection  of  people,  machines,  and 
methods  organized  to  accomplish  a  set  of 
specified  functions.  [IEEE83] 

SYSTEM  ARCHITECTURE:  The  structure  and 
relationship  among  the  components  of  a  system. 
A  system  architecture  may  also  include  the 
system’s  interface  with  its  operational 


133 

UNCLASSIFIED 


UNCLASSIFIED 


environment.  [IEF.E83] 

SYSTEM  COMPONENT;  See  COMPONENT. 

SYSTEM  DESIGN:  The  process  of  defining  the 
hardware  and  software  architectures,  com¬ 
ponents,  modules,  interfaces,  and  data  for  a  sys¬ 
tem  to  satisfy  specified  system  requirements. 
[IEEE83] 

SYSTEM  HAZARD:  See  HAZARD. 

SYSTEM  INTERFACE:  See  INTERFACES. 

SYSTEM  PERFORMANCE:  See  PERFOR¬ 
MANCE. 

SYSTEM  REQUIREMENTS:  See  REQUIRE¬ 
MENTS. 

SYSTEM  SAFETY:  The  ability  of  the  system  to 
prevent  critical  failures  leading  to  unacceptable 
consequences.  Examples  of  unacceptable  conse¬ 
quences  include  the  failure  of  the  system  mission, 
and  loss  of  life  or  property. 

SYSTEM  SECURITY:  See  SECURITY. 

SYSTEM  ROBUSTNESS:  See  ROBUSTNESS. 

SYSTEM  SPECIFICATION:  See  SPECIFICA¬ 
TION. 


TASK  COMMUNICATION:  See  PROCESS 
COMMUNICATION. 

TASK  ENTRY  FAMILY:  An  entry  declaration 
for  a  task  which  includes  a  discrete  range  and  so 
declares  a  family  of  distinct  entries. 

TASK  SEQUENCING  FAULT:  A  fault  which 
occurs  when  a  program’s  tasks  interact  in  a 
different  order  than  anticipated. 

TASK  SEQUENCING  LANGUAGE:  A  language 
used  to  annotate  Ada  programs  by  specifying 
constraints  to  be  satisfied  by  sequences  of  task 
events.  These  constraints  can  be  transformed 
into  dynamic  checks  for  certain  types  of  faults 
and  failures. 

TEMPORAL  LOGIC:  A  logic  theory  with  tem¬ 
poral  quantifiers  (for  example  henceforth  and 
eventually),  which  permits  statements  about  tem¬ 
poral  conditions  to  be  made. 

TERMINATION:  The  act  of  finishing  a  program 
or  a  proof. 

TEST:  A  unit  test  of  a  single  module  consists  of 
(1)  a  collection  of  settings  for  the  input  space  of 
the  module,  and  (2)  exactly  one  invocation  of  the 
module.  A  unit  test  may  or  may  not  include  the 
effect  of  other  modules  which  are  invoked  by  the 
module  undergoing  testing.  [MillSl] 


SYSTEM  TESTING:  The  process  of  testing  an 
integrated  hardware  and  software  system  to  verify 
that  the  system  meets  its  specified  requirements. 
[IEEE83] 


TAP:  A  debugger  designed  to  detect  timing  faults 
caused  by  the  misordering  of  events  in  a  distri¬ 
buted  system. 

TARGET  ENVIRONMENT:  SEE  TARGET 
MACHINE. 

TARGET  MACHINE:  The  computer  on  which  a 
program  is  intended  to  operate.  Contrast  with 
HOST  MACHINE.  [IEEE83] 

TASK:  See  PROCESS. 


TEST  AND  EVALUATION  (T&E):  A  formal 
testing  process  used  to  evaluate  the  technical  and 
operational  characteristics  of  a  system.  Per¬ 
formed  in  a  number  of  stages,  for  example, 
QUALIFICATION  TESTING,  DEVELOPMEN¬ 
TAL  TEST  AND  EVALUATION,  INITIAL 
OPERATIONAL  TEST  AND  EVALUATION, 
OPERATIONAL  TEST  AND  EVALUATION, 
FOLLOW-ON  OPERATIONAL  TEST  AND 
EVALUATION. 

TEST  BED:  (1)  A  test  environment  containing 
the  hardware,  instrumentation  tools,  simulators, 
and  other  support  software  necessary  for  testing  a 
system  or  system  component.  (2)  The  repertoire 
of  test  cases  necessary  for  testing  a  system  or  sys¬ 
tem  component.  [IEEE83] 
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TEST  CASE:  See  TEST  DATA  SET. 

TEST  DATA:  See  TEST  DATA  SET. 

TEST  DATA  ADEQUACY:  See  ADEQUATE 
TESTDATA. 

TEST  DATA  GENERATOR:  An  automated  tool 
that  accepts  as  input  a  computer  program  and  test 
criteria,  generates  test  input  data  that  meet  these 
criteria,  and,  sometimes,  determines  the 
expected  outputs.  [IEEE83] 

TEST  DATA  SELECTION  STRATEGY:  Pro¬ 
vides  guidance  for  selecting  test  data  for  a  pro¬ 
gram;  for  example,  a  branch  testing  test  data 
selection  strategy  selects  data  that  cause  those 
program  paths  to  be  executed  such  that  each 
branch  is  executed  at  least  once. 

TEST  DATA  SET:  A  specific  set  of  input  and  out¬ 
put  values  for  the  variables  in  the  communication 
space  of  a  module  that  are  used  in  a  test.  Also 
called  a  test  case. 

TEST  DRIVER:  A  program  that  directs  the  exe¬ 
cution  of  another  program  against  a  collection  of 
test  data  sets.  Usually  the  test  driver  also  records 
and  organizes  the  output  generated  as  the  tests 
are  run.  [Adri82] 

TEST  GRAMMAR:  A  context-free  grammar 
which  describes  those  aspects  of  a  program  to  be 
tested,  as  well  as  the  assumptions  as  to  which  test 
cases  are  considered  equivalent.  The  grammar 
generates  test  data  in  levels  of  ever  increasing 
complexity  of  test  cases.  At  each  level  the  pro¬ 
grammer  may  use  the  results  of  testing  at  previous 
levels  to  strengthen  the  assumptions  on  the  test 
grammar,  thereby  reducing  the  number  of  test 
cases  generated  at  subsequent  levels.  [DACS79] 

TEST  HARNESS:  See  TEST  DRIVER. 

TEST  INSTRUMENTORS:  Automated  tools  that 
produce  an  altered  version  of  a  program  or  com¬ 
ponent  that  is  logically  equivalent  to  the 
unmodified  program  but  contains  calls  to  special 
data  collection  routines  that  record  information 
pertaining  to  the  execution  behavior  of  the  pro¬ 
gram. 


TEST  MANAGEMENT:  Management  procedures 
designed  to  control  in  an  ordered  way  a  large  and 
evolving  amount  of  pieces  of  information  on  sys¬ 
tem  features  to  be  tested,  on  system  implementa¬ 
tion  plans,  and  on  test  results.  [DACS79] 

TEST  PATH:  The  specific  (sequence)  set  of  seg¬ 
ments  that  is  traversed  as  the  result  of  a  unit  test 
operation  on  a  set  of  test  data.  A  module  can 
Imve  many  test  paths.  [MillSl] 

TEST  PLAN:  A  document  prescribing  the 
approach  to  be  taken  for  intended  testing  activi¬ 
ties.  The  plan  typically  identifies  the  items  to  be 
tested,  the  testing  to  be  performed,  test 
schedules,  personnel  requirements,  reporting 
requirements,  evaluation  criteria,  and  any  risks 
requiring  contingency  planning.  [IEEE83] 

TEST  POINT:  A  tuple  containing  a  value  for 
each  program  input. 

TEST  REPEATABILITY:  An  attribute  of  a  test 
indicating  whether  the  same  results  are  produced 
each  time  the  test  is  conducted.  [IEEE83] 

TESTABILITY:  (1)  The  extent  to  which  software 
facilitates  both  the  establishment  of  test  criteria 
and  the  evaluation  of  the  software  with  respect  to 
these  criteria.  (2)  The  extent  to  which  the 
definition  of  requirements  facilitates  analysis  of 
the  requirements  to  establish  test  criteria. 
PEEE83] 

TESTING:  The  process  of  exercising  or  evaluat¬ 
ing  a  system  or  system  component  by  manual  or 
automated  means  to  verify  that  it  satisfies 
specified  requirements  or  to  identify  differences 
between  expected  and  actual  results.  Contrast 
with  DEBUGGING.  [IEEE83] 

TESTING  COVERAGE  MEASURE:  In  general, 
a  measure  of  the  testing  coverage  achieved  as  a 
result  of  a  test,  often  expressed  as  a  percentage  of 
the  number  of  statements,  branches,  or  paths 
that  were  traversed.  [MillSl] 

TESTING  ENVIRONMENT:  A  collection  of 
software  tools  to  assist  the  user  in  planning,  con¬ 
ducting,  and  reporting  on  testing  activities. 
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TESTING,  EVALUATION,  AND  ANALYSIS 
MEDLEY:  A  testing  environment  under  develop¬ 
ment  at  the  University  of  California  (Irvine) 
which  will  support  incremental  and  integrated 
application  of  a  number  of  different  dynamic  and 
static  analysis  techniques  to  Ada  programs.  It  is 
expected  to  become  part  of  the  Arcadia  software 
development  environment. 

TESTING  TARGET:  The  current  module  (system 
testing)  or  current  segment  (unit  testing)  upon 
which  testing  effort  is  focused.  [MiUSl] 

THEOREM  PROVERS:  Tools  to  mechanize  the 
process  of  producing  a  formal  proof. 

THRESHOLD  VALUES:  Values  of  technical  or 
operational  properties  and  parameters  below 
which  the  overall  system  worth  will  be  unaccept¬ 
able. 

TIME  DOMAIN  MODELS:  Software  reliability 
models  in  which  reliability  is  considered  a  func¬ 
tion  of  time.  Include  tunes-between-failures 
models  and  failure-count  models. 

TIMES  BETWEEN  FAILURES  MODEL: 
Software  reliability  model  in  which  the  time 
between  failures  is  treated  as  a  random  variable 
whose  parameters  depend  on  the  number  of 
faults  remaining  in  the  program.  [Goel85]. 

TIMING  GRAPH;  A  directed  acyclic  graph 
representing  the  partial  ordering  of  events  for  a 
distributed  program. 

TOOL:  (1)  See  SOFTWARE  TOOL.  (2)  A 
hardware  device  used  to  analyze  software  or  its 
performance. 

TOP  DOWN  TESTING  STRATEGY:  A  sys¬ 
tematic  testing  philosophy  which  seeks  to  test 
those  modules  at  the  top  of  the  invocation  struc¬ 
ture  earliest.  [Mill81] 

TOTAL  CORRECTNESS:  In  proof  of  correct¬ 
ness,  a  designation  indicating  that  a  program’s 
output  assertions  follow  logically  from  its  input 
assertions  and  processing  steps,  and  that,  in  addi¬ 
tion,  the  program  terminates  under  all  specified 
input  conations.  [IEEE83] 


TRACE:  See  PROGRAM  TRACE. 

TRACE  MUTATION  TESTING:  A  form  of  muta¬ 
tion  testing  where  certain  classes  of  program 
traces  rather  than  output  values  are  used  for  dis¬ 
tinguishing  between  a  program  and  its  mutants. 
This  eliminates  the  need  for  assumptions  such  as 
the  Coupling  Effect  and  allows  repeated  applica¬ 
tions  of  mutation  transformations. 

TREE:  An  abstract  hierarchical  structiu-e  consist¬ 
ing  of  nodes  connected  by  branches,  in  which:  (a) 
each  branch  connects  one  node  to  a  directly  sub¬ 
sidiary  node,  and  (b)  there  is  a  unique  node  called 
the  root  that  is  not  subsidiary  to  any  other  node, 
and  (c)  every  node  besides  the  root  is  directly 
subsidiary  to  exactly  one  other  node.  [IEEE83] 

TRUSTWORTHINESS  OF  SOFTWARE:  Proba¬ 
bility  that  no  serious  [software]  design  error 
remains  after  a  set  of  randomly  chosen  tests  [have 
been]  passed.  [Pam88]. 

TYPE  ANALYSIS:  A  form  of  static  error  analysis 
involving  the  determination  of  correct  use  of 
named  data  items  and  operations.  Usually,  type 
analysis  is  used  to  determine  whether  or  not  the 
domain  of  values  (functions,  etc.)  attributed  to  an 
entity  are  done  so  in  a  correct  and  consistent 
manner. 


UNIFORM:  All  possible  values  or  selections 
occur  with  equal  probability.  [Musa87] 

UNIFORMITY  HYPOTHESIS:  The  uniformity 
hypothesis  consists  in  assuming  that  if  the  test  is 
successful  for  one  datum  in  a  subdomain  then  the 
program  behaves  correctly  for  any  data  in  this 
subdomain. 

UNIT  TEST:  TEST. 

UNIT  TESTING:  The  process  of  testing  each  unit 
in  isolation.  See  also  INTEGRATION  TESTING 
AND  SYSTEM  TESTING. 

UNITS  ANALYSIS:  Units  analysis  determines 
whether  or  not  the  units  or  physical  dimensions 
attributed  to  an  entity  are  correctly  defined  and 
consistently  used. 
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UNREACHABILITY:  A  statement  (or  segment) 
is  unreachable  if  there  is  no  logically  obtainable 
set  of  input-space  settings  which  can  cause  the 
statemmit  (or  s^ment)  to  be  traversed.  [MillSl] 

UNSAFE  STATE:  A  state  which  may  lead  to  a 
safety  failure  unless  some  specific  action  is  taken 
to  avert  it. 

UPDATE  ANALYSIS:  In  the  AdaPIC  system, 
update  analysis  compares  two  versions  of  the 
same  submodule  to  look  for  changes  in  declara¬ 
tions,  requisition/  provision  specifications,  or 
references  to  non-local  entities. 

USER  INTERACTION  MODEL:  A  model  which 
defines  the  possible  user  interaction  with  a 
software  system  or  tool. 


VAL:  A  formal  language  for  specifying  the 
behavior  of  hardware  designs  whose  architectures 
are  specified  in  VHDL.  It  provides  the  capability 
for  automatic  comparison  of  behavior  of  different 
levels  of  a  VHDL  hierarchical  design  during 
simulation. 

VALIDATED  Ada  COMPILER:  An  Ada  com¬ 
piler  that  has  been  determined  by  the  Ada  Joint 
Program  OfBce  to  compile  Ada  source  code  in 
accordance  with  the  language  specification  given 
in  the  Ada  Langu^e  Reference  Manual. 

VALIDATED  METRIC:  A  software  quality 
metric  whose  values  have  a  specified  association 
with  the  corresponding  values  of  a  designated 
quality  factor  or  with  the  values  of  a  valid  metric 
of  that  factor,  when  the  two  sets  of  metric  values 
are  obtained  from  the  same  domain  (e.g.,  the 
same  software  components).  [IEEE88] 

VALIDATION:  (1)  The  process  of  evaluating 
software  at  the  end  of  the  software  development 
process  to  ensure  compliance  with  software 
requirements.  [IEEE83]  (2)  Static  and  dynamic 
andysis  of  a  software  product  to  ensure  it  attains 
the  features  and  performance  attributes 
prescribed  by  its  requirements. 

VARIABLE:  (1)  A  quantity  that  can  assume  any 
of  a  given  set  of  values.  (2)  In  programming,  a 


character  or  group  of  characters  that  refers  to  a 
value  and,  in  the  execution  of  a  computer  pro¬ 
gram,  corresponds  to  an  address.  [IEEE83] 

VARIABLE  ASSIGNMENT:  An  repression 
which  assigns  a  value  to  a  variable. 

VARIABLE  DEFINmON:  A  program  statement 
which  defines  a  variable  and  its  allowable  usage. 

VARIABLE  NAME:  An  identifier  allocated  to  a 
variable  for  purposes  of  reference.  See  also 
VARIABLE. 

VARIABLE  REFERENCE:  Accessing  a  value 
from  a  variable. 

VARIABLE  UNDEFINmON:  Causing  the  value 
of  a  variable  to  become  undefined;  for  example, 
when  the  program  control  flow  passes  beyond  the 
scope  of  a  variable. 

VARIABLE  USAGE  ERROR:  A  programming 
anomaly  arising  from  the  erroneous  usage  of  vari¬ 
ables;  for  example,  a  reference  to  an  undefined 
variable,  the  definition  of  a  variable  which  is 
never  referenced,  or  a  dead  variable  definition 
where  a  variable  is  defined  twice  without  an  inter¬ 
vening  reference. 

VERIFIABILITY:  The  adequacy  with  which  a 
given  algorithm  represents  the  requirements  of 
the  physical  world.  [RADC83]. 

VERIFICATION:  (1)  The  process  of  determining 
whether  or  not  the  products  of  a  given  phase  of 
the  software  development  cycle  fulfill  the  require¬ 
ments  correctness.  (3)  The  act  of  reviewing, 
inspecting,  testing,  checking,  auditing,  or  other¬ 
wise  establishing  and  documenting  whether  or  not 
items,  processes,  services,  or  documents  con¬ 
form  to  specified  requirements.  [IEEE83] 

VERIFICATION  CONDITION  GENERATOR: 
A  program  that  generates  sets  of  logical  condi¬ 
tions  that  must  be  proven  in  order  to  verify 
software. 

VERIFICATION  SYSTEM:  A  software  tool  that 
accepts  as  input  a  computer  program  and  a 
representation  of  its  specification,  and  produces, 
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possibfy  with  human  help,  a  correctness  proof  or 
disproof  of  the  program.  [IEEE83] 

VHDL:  A  high-level,  wide-spectrum  hardware 
dedgn  language  deseed  to  support  the  develop¬ 
ment  of  distributed  systems. 

VISIBILnY:  The  visibility  of  a  variable  refers  to 
those  locations  within  a  program  where  the  vari¬ 
able  is  available  for  reference.  The  visibility  is 
determined  by  the  declaration,  scope,  and  bind¬ 
ing  rules  given  in  a  programming  language. 

WALKTHROUGH:  A  manual  review  process  in 
which  the  designer  or  programmer  leads  one  or 
more  other  members  of  the  development  team 
through  a  segment  of  design  or  code  that  he  or 
she  has  written,  while  the  other  members  ask 
questions  and  make  comments  about  technique, 
style,  possible  errors,  violation  of  development 
standards,  and  other  problems.  [IEEES3] 

WATERFALL  SOFTWARE  DEVELOPMENT 
LIFE  CYCLE:  A  discipline  of  software  develop¬ 
ment  which  proceeds  in  a  series  of  discrete  steps. 
The  standard  DOD  ordering  of  these  steps  is  as 
follows:  software  requirements  analysis,  prelim¬ 
inary  design,  detailed  design,  coding  and  unit 
testing,  integration  testing,  and  system  testing. 

WEAK  MUTATION  TESTING:  A  form  of  muta¬ 
tion  testing  where  mutation  transformations  are 
applied  to  program  components  rather  than  a 
program  as  a  whole  mutation  testing  in  its  original 
form,  this  technique  does  not  guarantee  exposure 
of  all  faults  in  the  class  of  faults  associated  with 
mutation  transformations  but  does  allow  repeated 
application  of  mutation  transformations  for  a  sin¬ 
gle  test. 

WHITE  BOX  TESTING:  Testing  approaches 
which  examine  the  program  structure  and  derive 
test  data  from  the  program  logic. 

WIDE-SPECTRUM  LANGUAGE:  A  langu^e 
which  can  serve  several  purposes;  for  example, 
can  be  used  in  a  series  of  successive  software 
development  phases. 

WORST  CASE  ANALYSIS:  Analysis  that 


assumes  the  worst-case  conditions  for  every 

parameter  under  study. 
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APPENDKB: 

ACRONYMS 

ACM 

Association  for  Computing  Machinery 

ACVC 

Ada  Compiler  Validation  Capability 

AdaMAT 

Ada  Metrics  and  Analysis  Tool 

A1 

Artificial  Intelligence 

AJPO 

Ada  Joint  Program  Office 

ANNA 

ANNotated  Ada 

AT&T 

American  Telephone  &  Telegraph 

ATVS 

Ada  Test  and  Verification  System 

AVA 

Annotated  Verifiable  Ada 

BIT 

Built-in-test 

BM/C3 

Battle  Management/Command,  Control,  and  Communication 

C3 

Command,  Control,  and  Communication 

COCOMO 

constructive  COst  MOdel 

CPU 

Central  Processing  Unit 

CSC 

Computer  Software  Component 

CSCI 

Computer  Configuration  Item 

CSP 

Communicating  Sequential  Processes 

CSU 

Computer  Software  Unit 

DACS 

Data  and  Analysis  Center  for  Software 

DAISTS 

Data-Abstraction  Implementation,  Specification,  and  Testing  System 

DARPA 

Defense  Advanced  Research  Projects  Agency 

DBMS 

DataBase  Management  System 

DIANA 

Descriptive  Intermediate  Attributed  Notation  for  Ada 

DOD 

Department  of  Defense 

DT&E 

Developmental  Test  and  Evaluation 

IEEE 

Institute  of  Electrical  and  Electronics  En^eers 

EIA 

Electronics  Industries  Association 

ESTCA 

Error  Sensitive  Test  Case  Analysis 

FDM 

Formal  Development  Methodology 

FOT&E 

Follow-on  Operational  Test  and  Evaluation 

FSD 

Full  Scale  Development 

GFE 

Government  Furnished  Equipment 

GVE 

Gypsy  Verification  Environment 

HDL 

Hardware  Design  Language 

HDM 

Hierarchical  Development  Methodology 

HOL 

High  Order  Language 

IDA 

Institute  for  Defense  Analyses 

IEEE 

Institute  for  Electrical  and  Electronics  Engineers 

I/O 

Input/Output 

lOT&E 

Initial  Operational  Test  and  Evaluation 

IV&V 

Independent  Verification  and  Validation 

LCSAJ 

Linear  Code  Sequence  and  Jump 

LOC 

Lines  of  Code 

MIMD 

Multiple-Instruction,  Multiple-Data  streams 

MTBF 

Mean  Time  Between  Faults 

MTTF 

Mean  Time  to  Failure 

MTTR 

Mean  Time  to  Repair 

NASA 

National  Aeronautics  and  Space  Administration 
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NCSC 

NOSC 

NTB 

NTDS 

NTF 

PDL 

OT&E 

OSD 

QT 

R&D 

RADC 

RAM 

SADMT 

SCA 

SDI 

SDIO 

SDS 

SEL 

SIMD 

SMDC 

SOIF 

SSDC 
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